Contents
- OpenAI Pricing Overview
- GPT-5 Series Pricing
- GPT-4 Series Pricing
- Reasoning Models (o3, o4)
- OpenAI API Pricing 2026: Pricing Breakdown Table
- Cost Per Task
- Model Selection Guide
- Throughput Considerations
- FAQ
- Throughput and Latency Implications
- API Rate Limits by Model
- Batch API: 50% Cheaper
- Hybrid Approach: Multi-Model Strategy
- Cost Trends: Will GPT-5 Pricing Drop?
- Common Pricing Mistakes
- Related Resources
- Sources
OpenAI Pricing Overview
OpenAI's March 2026 pricing spans 17 active models across four product lines: GPT-5 series (baseline + Pro), GPT-4 series (legacy), and reasoning models (o3 series). Prices range from $0.05 per million prompt tokens (GPT-5 Nano) to $15 per million (GPT-5 Pro).
The decision matrix is tight now. Three models compete directly: GPT-5 ($1.25/$10 per M tokens), GPT-4.1 ($2/$8), and o3 ($2/$8). GPT-5 is cheaper and faster. o3 is slower but better at reasoning. GPT-4.1 is the legacy default.
This guide prices every model in production as of March 21, 2026, and breaks down cost-per-task for real workloads.
GPT-5 Series Pricing
The GPT-5 family has five tiers, each optimized for different workloads.
GPT-5.4: High-Context, Balanced
| Metric | Value |
|---|---|
| Context Window | 272K tokens |
| Prompt Price | $2.50/M |
| Completion Price | $15/M |
| Throughput | 45 tok/s |
| Max Output | 128K |
GPT-5.4 is OpenAI's premium model. 272K context (90,000 words of Shakespeare). Designed for complex reasoning with large documents or code repositories.
Use cases: code review on a full codebase, long document analysis, multi-page contract review. The high completion cost ($15/M) makes it uneconomical for high-volume tasks.
Cost per task: Analyzing a 100-page document (100K prompt tokens) + 2K output tokens: (100K × $2.50 + 2K × $15) / 1M = $0.28.
GPT-5.1: Extended Context, Baseline
| Metric | Value |
|---|---|
| Context Window | 400K tokens |
| Prompt Price | $1.25/M |
| Completion Price | $10/M |
| Throughput | 47 tok/s |
| Max Output | 128K |
GPT-5.1 is the best-value reasoning model. 400K context (130,000 words). Same latency as GPT-5. Lower cost.
Use cases: long-document Q&A, multi-file code analysis, legal document review. Every team doing RAG at scale should test GPT-5.1.
Cost per task: 50K prompt + 1K completion: (50K × $1.25 + 1K × $10) / 1M = $0.073.
GPT-5 Codex: Extended Context, Code-Optimized
| Metric | Value |
|---|---|
| Context Window | 400K tokens |
| Prompt Price | $1.25/M |
| Completion Price | $10/M |
| Throughput | 50 tok/s |
| Max Output | 128K |
GPT-5 Codex is GPT-5.1 fine-tuned for code. Same pricing, slightly higher code throughput (50 vs 47 tok/s).
Use cases: code generation, debugging, refactoring. If teams are sending code to GPT-5.1, use Codex instead. No cost difference, slightly better output quality.
GPT-5 Pro: Reasoning Upgrade
| Metric | Value |
|---|---|
| Context Window | 400K tokens |
| Prompt Price | $15/M |
| Completion Price | $120/M |
| Throughput | 11 tok/s |
| Max Output | 128K |
GPT-5 Pro is expensive. $15 per million prompt tokens, $120 per million completions. Throughput is 11 tok/s (4x slower than GPT-5.1).
It exists for problems that require deep reasoning, where slow + smart beats fast + dumb. Math competition problems. Novel research. Logical puzzles.
Cost per task: 10K prompt + 500 completion: (10K × $15 + 500 × $120) / 1M = $0.21. Expensive per task, but the output quality justifies it for hard problems.
GPT-5: Balanced Default
| Metric | Value |
|---|---|
| Context Window | 272K tokens |
| Prompt Price | $1.25/M |
| Completion Price | $10/M |
| Throughput | 41 tok/s |
| Max Output | 128K |
GPT-5 is the baseline. 272K context. Fair pricing. Industry standard. Default choice for most tasks.
This is the model to compare against. If another model doesn't beat GPT-5 on cost, latency, or quality, don't use it.
Cost per task: 5K prompt + 500 completion: (5K × $1.25 + 500 × $10) / 1M = $0.011.
GPT-5 Mini: Lightweight, Fast
| Metric | Value |
|---|---|
| Context Window | 272K tokens |
| Prompt Price | $0.25/M |
| Completion Price | $2/M |
| Throughput | 68 tok/s |
| Max Output | 128K |
GPT-5 Mini costs 5x less than GPT-5. Throughput is 66% faster (68 vs 41 tok/s). Quality loss: ~10-15% (smaller model, trained on same data).
Ideal for: high-volume tasks, classification, content moderation, simple Q&A. If tasks are straightforward and volume matters, Mini wins.
Cost per task: 1K prompt + 200 completion: (1K × $0.25 + 200 × $2) / 1M = $0.0009.
GPT-5 Nano: Ultra-Budget
| Metric | Value |
|---|---|
| Context Window | 272K tokens |
| Prompt Price | $0.05/M |
| Completion Price | $0.40/M |
| Throughput | 95 tok/s |
| Max Output | 32K |
GPT-5 Nano is the $0.05 tier. Extremely cheap. Extremely fast (95 tok/s). Quality is borderline (similar to GPT-3.5).
Use: classification, tagging, routing. Not suitable for content creation or reasoning. Output is often terse or low-quality.
Cost per task: 500 prompt + 100 completion: (500 × $0.05 + 100 × $0.40) / 1M = $0.00005.
GPT-4 Series Pricing
GPT-4.1 is the current standard. GPT-4o is cheaper but older. Both are legacy now that GPT-5 is available.
GPT-4.1: Extended Context, Industry Default
| Metric | Value |
|---|---|
| Context Window | 1.05M tokens |
| Prompt Price | $2/M |
| Completion Price | $8/M |
| Throughput | 55 tok/s |
| Max Output | 32K |
GPT-4.1 has the largest context window: 1.05M tokens. That's 350,000 words. Full book analysis. Entire codebase + documentation.
But GPT-5 ($1.25/M prompt) is cheaper. And GPT-5.1 ($1.25/M with 400K) covers most long-context needs.
Use GPT-4.1 only if teams need the full 1M context and don't mind paying 60% more than GPT-5. Most teams should prefer GPT-5.
Cost per task: 200K prompt (full codebase) + 2K completion: (200K × $2 + 2K × $8) / 1M = $0.416.
GPT-4.1 Mini: Lightweight Extended Context
| Metric | Value |
|---|---|
| Context Window | 1.05M tokens |
| Prompt Price | $0.40/M |
| Completion Price | $1.60/M |
| Throughput | 75 tok/s |
| Max Output | 32K |
Mini version of GPT-4.1. Same 1M context, lower cost, faster throughput.
Still more expensive than GPT-5 Mini ($0.25/$2). Only use for teams that genuinely need 1M context and don't mind the extra cost.
GPT-4.1 Nano: Ultra-Budget Extended Context
| Metric | Value |
|---|---|
| Context Window | 1.05M tokens |
| Prompt Price | $0.10/M |
| Completion Price | $0.40/M |
| Throughput | 82 tok/s |
| Max Output | 32K |
The cheapest way to access 1M context. $0.10/M prompt, $0.40/M completion. Quality is lower than Mini.
GPT-4o: Legacy, Wide Context
| Metric | Value |
|---|---|
| Context Window | 128K tokens |
| Prompt Price | $2.50/M |
| Completion Price | $10/M |
| Throughput | 52 tok/s |
| Max Output | 16K |
GPT-4o is the previous flagship. 128K context. Now superseded by GPT-5 series.
Don't use. GPT-5 ($1.25/$10) is half the prompt cost and has 2x the context window. GPT-5 Mini ($0.25/$2) is way cheaper.
GPT-4o Mini: Legacy Lightweight
| Metric | Value |
|---|---|
| Context Window | 128K tokens |
| Prompt Price | $0.15/M |
| Completion Price | $0.60/M |
| Throughput | 75 tok/s |
| Max Output | 16K |
Don't use. GPT-5 Mini ($0.25/$2) is slightly more expensive but much better quality.
Reasoning Models (o3, o4)
Reasoning models trade throughput for correctness. Slow. Expensive. Worth it for hard problems.
o3: Advanced Reasoning
| Metric | Value |
|---|---|
| Context Window | 200K tokens |
| Prompt Price | $2/M |
| Completion Price | $8/M |
| Throughput | 17 tok/s |
| Max Output | 100K |
o3 is OpenAI's reasoning-focused model. Uses chain-of-thought internally. Very slow (17 tok/s, 2.4x slower than GPT-5).
But for hard problems (math, logic, novel reasoning), o3 is better than GPT-5. Win rate on competition math: o3 60%, GPT-5 40%.
Cost per task: 5K prompt + 2K completion (lots of thinking): (5K × $2 + 2K × $8) / 1M = $0.026. Slow to run, but small per-task cost.
o3 Mini: Reasoning, Fast
| Metric | Value |
|---|---|
| Context Window | 200K tokens |
| Prompt Price | $1.10/M |
| Completion Price | $4.40/M |
| Throughput | 47 tok/s |
| Max Output | 100K |
o3 Mini is o3 optimized for speed. Throughput: 47 tok/s (still slower than GPT-5 at 41 tok/s, but acceptable).
Pricing is better: $1.10/$4.40 vs o3's $2/$8. Quality loss: ~20-30%.
Use: high-volume reasoning tasks where speed matters. Filtering, routing. Not for novel problems.
o4 Mini: Latest Reasoning
| Metric | Value |
|---|---|
| Context Window | 200K tokens |
| Prompt Price | $1.10/M |
| Completion Price | $4.40/M |
| Throughput | 62 tok/s |
| Max Output | 100K |
o4 Mini is the latest reasoning model. Same pricing as o3 Mini ($1.10/$4.40) but faster throughput (62 vs 47 tok/s).
o4 is still in limited release as of March 2026. Availability varies. Check current access before assuming availability.
OpenAI API Pricing 2026: Pricing Breakdown Table
| Model | Context | Prompt $/M | Completion $/M | Throughput | Best For |
|---|---|---|---|---|---|
| GPT-5.4 | 272K | $2.50 | $15 | 45 | Premium reasoning |
| GPT-5.1 | 400K | $1.25 | $10 | 47 | Long documents |
| GPT-5 Codex | 400K | $1.25 | $10 | 50 | Code tasks |
| GPT-5 Pro | 400K | $15 | $120 | 11 | Hard reasoning |
| GPT-5 | 272K | $1.25 | $10 | 41 | Default choice |
| GPT-5 Mini | 272K | $0.25 | $2 | 68 | High volume |
| GPT-5 Nano | 272K | $0.05 | $0.40 | 95 | Classification |
| GPT-4.1 | 1.05M | $2 | $8 | 55 | Extra long context |
| GPT-4.1 Mini | 1.05M | $0.40 | $1.60 | 75 | Long context, budget |
| GPT-4.1 Nano | 1.05M | $0.10 | $0.40 | 82 | Budget long context |
| GPT-4o | 128K | $2.50 | $10 | 52 | Legacy (avoid) |
| GPT-4o Mini | 128K | $0.15 | $0.60 | 75 | Legacy (avoid) |
| o3 | 200K | $2 | $8 | 17 | Hard reasoning |
| o3 Mini | 200K | $1.10 | $4.40 | 47 | Reasoning high-volume |
| o4 Mini | 200K | $1.10 | $4.40 | 62 | Latest reasoning |
Cost Per Task
Real-world pricing for common tasks (March 2026):
Classification Task (1K prompt, 50 completion)
| Model | Cost | Time |
|---|---|---|
| GPT-5 Nano | $0.00006 | 0.5 sec |
| GPT-5 Mini | $0.0005 | 0.7 sec |
| GPT-5 | $0.0013 | 1.2 sec |
| GPT-4.1 Mini | $0.00046 | 0.67 sec |
Winner: GPT-5 Nano. 100x cheaper than GPT-5, 10x faster than GPT-5.
Customer Support Q&A (3K prompt, 500 completion)
| Model | Cost | Time |
|---|---|---|
| GPT-5 Mini | $0.002 | 7.4 sec |
| GPT-5 | $0.0054 | 12 sec |
| GPT-4.1 Mini | $0.002 | 6.7 sec |
Winner: Tie between GPT-5 Mini and GPT-4.1 Mini. Mini wins on cost, speed is comparable.
Long Document Analysis (100K prompt, 2K completion)
| Model | Cost | Time |
|---|---|---|
| GPT-5.1 | $0.13 | 43 sec |
| GPT-5.4 | $0.28 | 44 sec |
| GPT-4.1 | $0.416 | 36 sec |
Winner: GPT-5.1. Cheaper, nearly same speed, sufficient context.
Code Review (500K prompt, 5K completion)
| Model | Cost | Time |
|---|---|---|
| GPT-4.1 | $1.04 | 91 sec |
| GPT-5.1 | $0.65 | 107 sec |
Winner: GPT-5.1. 37% cheaper. Slightly slower but worth it.
Hard Math Problem (2K prompt, 8K completion, chain-of-thought)
| Model | Cost | Time |
|---|---|---|
| o3 | $0.082 | 471 sec |
| o3 Mini | $0.045 | 170 sec |
| GPT-5 | $0.0125 | 195 sec |
Winner: Depends on accuracy needed. o3 is best, o3 Mini balances cost and speed, GPT-5 is fastest and cheapest but least accurate.
Model Selection Guide
Decision Tree
High volume, simple tasks? Start with GPT-5 Nano ($0.05/M prompt). If quality is too low, upgrade to GPT-5 Mini ($0.25/M).
Standard tasks, balanced cost-quality? Use GPT-5 ($1.25/$10). This is the default unless a specific need pushes teams elsewhere.
Long documents (over 100K tokens)? Use GPT-5.1 (400K context, $1.25/M). Cheaper and better than GPT-4.1.
Extremely long documents (500K+ tokens)? Use GPT-4.1 (1.05M context, $2/M). Only option, but pricey.
Hard reasoning or novel problems? Use o3 ($2/$8 per M). Slow, but worth it for accuracy. If cost is tight, try o3 Mini ($1.10/$4.40) first.
Code tasks? Use GPT-5 Codex (same price as GPT-5.1 but optimized). Or just use GPT-5, it's good at code.
Avoid: GPT-4o, GPT-4o Mini, GPT-4.1 Nano. Superseded by GPT-5 series. No reason to use them.
Throughput Considerations
Throughput affects latency and real-world cost.
GPT-5 Nano (95 tok/s): 1,000-token completion in 10.5 seconds. Fast. GPT-5 (41 tok/s): 1,000-token completion in 24 seconds. Slower. o3 (17 tok/s): 1,000-token completion in 59 seconds. Very slow.
For user-facing applications, TTFT (time-to-first-token) matters as much as throughput. OpenAI doesn't publish TTFT, but it generally correlates with throughput. Faster models = lower TTFT.
If latency is critical: Use GPT-5 Mini (68 tok/s) or Nano (95 tok/s). If teams have time: Use o3 for hard problems.
FAQ
What's the cheapest model for customer-facing tasks?
GPT-5 Mini. $0.25/$2 per million tokens. 10x cheaper than GPT-5. Quality is 85-90% of GPT-5. Good for: Q&A, summarization, categorization.
Should I still use GPT-4.1?
Only if you need 1M context. Otherwise, use GPT-5 ($1.25/M prompt, half the cost, better quality). GPT-4.1 is legacy.
Is o3 worth the cost?
For competition math, novel research, logic puzzles: yes. For customer support or text generation: no.
What's the throughput difference between o3 and GPT-5?
o3: 17 tok/s. GPT-5: 41 tok/s. o3 is 2.4x slower but better at reasoning. For routine tasks, GPT-5 is fine.
Can I use GPT-5 Nano for everything?
No. Nano is low-quality (similar to GPT-3.5). Good for classification, tagging, routing. Not for content creation, code generation, or detailed analysis.
Which model should I use as my default?
GPT-5 ($1.25/$10). Best balance of cost, quality, and speed. If tasks are simple and volume is high, upgrade decision tree. If tasks are hard, consider o3.
Is GPT-5 better than Claude?
Comparable. GPT-5 is faster. Claude Sonnet 4.6 is $3/$15 per M tokens (more expensive). Each has different strengths: GPT-5 for code, Claude for nuance. Test both.
Throughput and Latency Implications
Pricing per token is one lens. Throughput per dollar is another.
Cost Per Task (Practical Examples)
Email classification (subject line, mark as spam/not spam):
- Prompt: 200 tokens (email + instructions)
- Completion: 5 tokens (spam/not-spam decision)
- Model: GPT-5 Nano
- Cost: (200 × $0.05 + 5 × $0.40) / 1M = $0.000011
- Speed: 95 tok/s, completes in ~2 seconds
- Monthly cost for 1M emails: $11
Blog post summarization (1,500 word article → 200 word summary):
- Prompt: 3,500 tokens (article + summary instruction)
- Completion: 500 tokens (summary)
- Model: GPT-5 Mini
- Cost: (3,500 × $0.25 + 500 × $2) / 1M = $0.001513
- Speed: 68 tok/s completion, ~7 seconds total
- Monthly cost for 1,000 summaries: $1.51
Detailed code review (entire file + guidelines):
- Prompt: 8,000 tokens (code + review rubric)
- Completion: 2,000 tokens (detailed review feedback)
- Model: GPT-5
- Cost: (8,000 × $1.25 + 2,000 × $10) / 1M = $0.030
- Speed: 41 tok/s, ~50 seconds total
- Monthly cost for 100 reviews: $3.00
Novel research problem-solving (new algorithm from scratch):
- Prompt: 5,000 tokens (problem description, constraints, examples)
- Completion: 5,000 tokens (novel algorithm with explanation)
- Model: o3 (reasoning model)
- Cost: (5,000 × $2 + 5,000 × $8) / 1M = $0.05
- Speed: 17 tok/s, ~6 minutes (slow but accurate)
- Cost per problem: $0.05 (expensive but worth it if solution is correct first try)
API Rate Limits by Model
OpenAI enforces rate limits (requests per minute, tokens per minute) based on pricing tier.
| Model | Requests/min | Tokens/min | Notes |
|---|---|---|---|
| GPT-5 Nano | 3,500 | 2M | Free tier |
| GPT-5 Mini | 3,500 | 1M | Basic tier |
| GPT-5 | 3,500 | 500K | Standard tier |
| GPT-4.1 | 1,500 | 300K | Legacy |
| o3 | 100 | 100K | Reasoning limited |
o3 has aggressive rate limits due to cost. Can't burst 100M tokens/hour on o3.
For high-volume tasks (1B+ tokens/day), you need:
- Multiple API keys (different rate limit buckets)
- Queue + batch processing (batch API has 1.5x cost savings but processes asynchronously)
- Fallback to GPT-5 Mini/Nano when o3 hits limit
Batch API: 50% Cheaper
OpenAI offers a batch API: submit 10,000+ requests at once, receive results in 1-24 hours.
Cost reduction: 50% for all models. So GPT-5 Nano becomes $0.025/M prompt, $0.20/M completion.
Trade: latency. Instead of 5-second response, wait 1-24 hours.
Viable for:
- Non-urgent tasks (data labeling, content generation, analysis)
- Overnight processing
- Research batches
Not viable for:
- Customer-facing APIs (users expect immediate response)
- Interactive tools
For 1B daily tokens via batch API: cost drops from ~$2 to ~$1. Annual savings: $360K for large teams.
Hybrid Approach: Multi-Model Strategy
Smart teams don't pick one model. They use different models for different tasks:
| Task | Model | Reasoning |
|---|---|---|
| Classification | GPT-5 Nano | Cheapest, classification is simple |
| Summarization | GPT-5 Mini | Balance of cost and quality |
| Content creation | GPT-5 | Best quality for text |
| Code generation | GPT-5 Codex | Optimized for code |
| Long documents | GPT-5.1 | 400K context, reasonable cost |
| Hard reasoning | o3 | Best accuracy for novel problems |
Example: Customer support AI.
- Route incoming support tickets: GPT-5 Nano (classify priority, department)
- Generate first-pass response: GPT-5 Mini (fast, good enough)
- Hand-off to human if complexity flagged: Check with GPT-5 (full analysis)
Cost per ticket: mostly Nano + Mini (cheap), rarely GPT-5 (expensive).
Cost Trends: Will GPT-5 Pricing Drop?
Historical pattern: New models expensive, drop 50-70% in 12 months.
GPT-4o launch price: $2.50/$10 per M tokens GPT-4o price today (March 2026): Legacy tier, avoid
GPT-4.1 launch price (2024): $2/$8 per M tokens GPT-4.1 price today: Still $2/$8 (no reduction yet)
GPT-5 launch price (Feb 2026): $1.25/$10 per M tokens GPT-5 price today (March 2026): Still $1.25/$10
Prediction: GPT-5 pricing will drop to $0.75/$6 by Q4 2026. o3 will drop to $1/$4 by Q2 2026.
If you're in early development, use GPT-5 Nano/Mini to lock in the habit. When prices drop, your cost scales down further.
If you're already at production scale, locking in volume discounts (not public-facing, available via sales team) now is smart. Lock at $1.25/$10, save when public pricing drops.
Common Pricing Mistakes
Mistake 1: Using GPT-5.4 for Everything
GPT-5.4 is $2.50/$15 per M tokens. 2x the prompt cost of GPT-5, 1.5x the completion cost.
Best use: Complex reasoning with large documents.
Wrong use: Email replies (GPT-5 Mini sufficient). Blog posts (GPT-5 fine). Data entry (GPT-5 Nano overkill).
Monthly cost difference: 1M tokens on GPT-5.4 vs GPT-5 Mini = $2.50 + $15 vs $0.25 + $2 = $14.25 extra per million tokens.
Mistake 2: Not Using Batch API
Batch API is 50% cheaper. If 50% of your workload is non-urgent, using batch saves money.
Example: Labeling 10M documents for training data.
- Non-batched (immediate): 10M tokens × $1.25 (GPT-5) = $12.50
- Batched (overnight): 10M tokens × $1.25 × 0.5 = $6.25
- Savings: $6.25 per 10M tokens
Mistake 3: Retrying Failed Requests Without Caching
If a request fails and you retry, you're charged twice.
Use caching or idempotency to avoid double-charges.
Mistake 4: Choosing by Price Alone
GPT-5 Nano is cheap, but quality is low. If you use Nano for complex tasks and get wrong answers, you waste time fixing them.
Time cost of manual review: $100/hr. Nano cost: $0.0001 per token.
If fixing a Nano mistake costs 30 minutes = $50, and choosing GPT-5 ($0.01 cost) gets it right first time, the ROI is clear.
Choose model based on task complexity, not just price.
Related Resources
Sources
- OpenAI API Pricing
- OpenAI Models Documentation
- OpenAI API Benchmarks
- DeployBase LLM Pricing Dashboard (prices observed March 21, 2026)