OpenAI API Pricing 2026: Complete Model Cost Breakdown

Deploybase · January 7, 2026 · LLM Pricing

Contents


OpenAI Pricing Overview

OpenAI's March 2026 pricing spans 17 active models across four product lines: GPT-5 series (baseline + Pro), GPT-4 series (legacy), and reasoning models (o3 series). Prices range from $0.05 per million prompt tokens (GPT-5 Nano) to $15 per million (GPT-5 Pro).

The decision matrix is tight now. Three models compete directly: GPT-5 ($1.25/$10 per M tokens), GPT-4.1 ($2/$8), and o3 ($2/$8). GPT-5 is cheaper and faster. o3 is slower but better at reasoning. GPT-4.1 is the legacy default.

This guide prices every model in production as of March 21, 2026, and breaks down cost-per-task for real workloads.


GPT-5 Series Pricing

The GPT-5 family has five tiers, each optimized for different workloads.

GPT-5.4: High-Context, Balanced

MetricValue
Context Window272K tokens
Prompt Price$2.50/M
Completion Price$15/M
Throughput45 tok/s
Max Output128K

GPT-5.4 is OpenAI's premium model. 272K context (90,000 words of Shakespeare). Designed for complex reasoning with large documents or code repositories.

Use cases: code review on a full codebase, long document analysis, multi-page contract review. The high completion cost ($15/M) makes it uneconomical for high-volume tasks.

Cost per task: Analyzing a 100-page document (100K prompt tokens) + 2K output tokens: (100K × $2.50 + 2K × $15) / 1M = $0.28.

GPT-5.1: Extended Context, Baseline

MetricValue
Context Window400K tokens
Prompt Price$1.25/M
Completion Price$10/M
Throughput47 tok/s
Max Output128K

GPT-5.1 is the best-value reasoning model. 400K context (130,000 words). Same latency as GPT-5. Lower cost.

Use cases: long-document Q&A, multi-file code analysis, legal document review. Every team doing RAG at scale should test GPT-5.1.

Cost per task: 50K prompt + 1K completion: (50K × $1.25 + 1K × $10) / 1M = $0.073.

GPT-5 Codex: Extended Context, Code-Optimized

MetricValue
Context Window400K tokens
Prompt Price$1.25/M
Completion Price$10/M
Throughput50 tok/s
Max Output128K

GPT-5 Codex is GPT-5.1 fine-tuned for code. Same pricing, slightly higher code throughput (50 vs 47 tok/s).

Use cases: code generation, debugging, refactoring. If teams are sending code to GPT-5.1, use Codex instead. No cost difference, slightly better output quality.

GPT-5 Pro: Reasoning Upgrade

MetricValue
Context Window400K tokens
Prompt Price$15/M
Completion Price$120/M
Throughput11 tok/s
Max Output128K

GPT-5 Pro is expensive. $15 per million prompt tokens, $120 per million completions. Throughput is 11 tok/s (4x slower than GPT-5.1).

It exists for problems that require deep reasoning, where slow + smart beats fast + dumb. Math competition problems. Novel research. Logical puzzles.

Cost per task: 10K prompt + 500 completion: (10K × $15 + 500 × $120) / 1M = $0.21. Expensive per task, but the output quality justifies it for hard problems.

GPT-5: Balanced Default

MetricValue
Context Window272K tokens
Prompt Price$1.25/M
Completion Price$10/M
Throughput41 tok/s
Max Output128K

GPT-5 is the baseline. 272K context. Fair pricing. Industry standard. Default choice for most tasks.

This is the model to compare against. If another model doesn't beat GPT-5 on cost, latency, or quality, don't use it.

Cost per task: 5K prompt + 500 completion: (5K × $1.25 + 500 × $10) / 1M = $0.011.

GPT-5 Mini: Lightweight, Fast

MetricValue
Context Window272K tokens
Prompt Price$0.25/M
Completion Price$2/M
Throughput68 tok/s
Max Output128K

GPT-5 Mini costs 5x less than GPT-5. Throughput is 66% faster (68 vs 41 tok/s). Quality loss: ~10-15% (smaller model, trained on same data).

Ideal for: high-volume tasks, classification, content moderation, simple Q&A. If tasks are straightforward and volume matters, Mini wins.

Cost per task: 1K prompt + 200 completion: (1K × $0.25 + 200 × $2) / 1M = $0.0009.

GPT-5 Nano: Ultra-Budget

MetricValue
Context Window272K tokens
Prompt Price$0.05/M
Completion Price$0.40/M
Throughput95 tok/s
Max Output32K

GPT-5 Nano is the $0.05 tier. Extremely cheap. Extremely fast (95 tok/s). Quality is borderline (similar to GPT-3.5).

Use: classification, tagging, routing. Not suitable for content creation or reasoning. Output is often terse or low-quality.

Cost per task: 500 prompt + 100 completion: (500 × $0.05 + 100 × $0.40) / 1M = $0.00005.


GPT-4 Series Pricing

GPT-4.1 is the current standard. GPT-4o is cheaper but older. Both are legacy now that GPT-5 is available.

GPT-4.1: Extended Context, Industry Default

MetricValue
Context Window1.05M tokens
Prompt Price$2/M
Completion Price$8/M
Throughput55 tok/s
Max Output32K

GPT-4.1 has the largest context window: 1.05M tokens. That's 350,000 words. Full book analysis. Entire codebase + documentation.

But GPT-5 ($1.25/M prompt) is cheaper. And GPT-5.1 ($1.25/M with 400K) covers most long-context needs.

Use GPT-4.1 only if teams need the full 1M context and don't mind paying 60% more than GPT-5. Most teams should prefer GPT-5.

Cost per task: 200K prompt (full codebase) + 2K completion: (200K × $2 + 2K × $8) / 1M = $0.416.

GPT-4.1 Mini: Lightweight Extended Context

MetricValue
Context Window1.05M tokens
Prompt Price$0.40/M
Completion Price$1.60/M
Throughput75 tok/s
Max Output32K

Mini version of GPT-4.1. Same 1M context, lower cost, faster throughput.

Still more expensive than GPT-5 Mini ($0.25/$2). Only use for teams that genuinely need 1M context and don't mind the extra cost.

GPT-4.1 Nano: Ultra-Budget Extended Context

MetricValue
Context Window1.05M tokens
Prompt Price$0.10/M
Completion Price$0.40/M
Throughput82 tok/s
Max Output32K

The cheapest way to access 1M context. $0.10/M prompt, $0.40/M completion. Quality is lower than Mini.

GPT-4o: Legacy, Wide Context

MetricValue
Context Window128K tokens
Prompt Price$2.50/M
Completion Price$10/M
Throughput52 tok/s
Max Output16K

GPT-4o is the previous flagship. 128K context. Now superseded by GPT-5 series.

Don't use. GPT-5 ($1.25/$10) is half the prompt cost and has 2x the context window. GPT-5 Mini ($0.25/$2) is way cheaper.

GPT-4o Mini: Legacy Lightweight

MetricValue
Context Window128K tokens
Prompt Price$0.15/M
Completion Price$0.60/M
Throughput75 tok/s
Max Output16K

Don't use. GPT-5 Mini ($0.25/$2) is slightly more expensive but much better quality.


Reasoning Models (o3, o4)

Reasoning models trade throughput for correctness. Slow. Expensive. Worth it for hard problems.

o3: Advanced Reasoning

MetricValue
Context Window200K tokens
Prompt Price$2/M
Completion Price$8/M
Throughput17 tok/s
Max Output100K

o3 is OpenAI's reasoning-focused model. Uses chain-of-thought internally. Very slow (17 tok/s, 2.4x slower than GPT-5).

But for hard problems (math, logic, novel reasoning), o3 is better than GPT-5. Win rate on competition math: o3 60%, GPT-5 40%.

Cost per task: 5K prompt + 2K completion (lots of thinking): (5K × $2 + 2K × $8) / 1M = $0.026. Slow to run, but small per-task cost.

o3 Mini: Reasoning, Fast

MetricValue
Context Window200K tokens
Prompt Price$1.10/M
Completion Price$4.40/M
Throughput47 tok/s
Max Output100K

o3 Mini is o3 optimized for speed. Throughput: 47 tok/s (still slower than GPT-5 at 41 tok/s, but acceptable).

Pricing is better: $1.10/$4.40 vs o3's $2/$8. Quality loss: ~20-30%.

Use: high-volume reasoning tasks where speed matters. Filtering, routing. Not for novel problems.

o4 Mini: Latest Reasoning

MetricValue
Context Window200K tokens
Prompt Price$1.10/M
Completion Price$4.40/M
Throughput62 tok/s
Max Output100K

o4 Mini is the latest reasoning model. Same pricing as o3 Mini ($1.10/$4.40) but faster throughput (62 vs 47 tok/s).

o4 is still in limited release as of March 2026. Availability varies. Check current access before assuming availability.


OpenAI API Pricing 2026: Pricing Breakdown Table

ModelContextPrompt $/MCompletion $/MThroughputBest For
GPT-5.4272K$2.50$1545Premium reasoning
GPT-5.1400K$1.25$1047Long documents
GPT-5 Codex400K$1.25$1050Code tasks
GPT-5 Pro400K$15$12011Hard reasoning
GPT-5272K$1.25$1041Default choice
GPT-5 Mini272K$0.25$268High volume
GPT-5 Nano272K$0.05$0.4095Classification
GPT-4.11.05M$2$855Extra long context
GPT-4.1 Mini1.05M$0.40$1.6075Long context, budget
GPT-4.1 Nano1.05M$0.10$0.4082Budget long context
GPT-4o128K$2.50$1052Legacy (avoid)
GPT-4o Mini128K$0.15$0.6075Legacy (avoid)
o3200K$2$817Hard reasoning
o3 Mini200K$1.10$4.4047Reasoning high-volume
o4 Mini200K$1.10$4.4062Latest reasoning

Cost Per Task

Real-world pricing for common tasks (March 2026):

Classification Task (1K prompt, 50 completion)

ModelCostTime
GPT-5 Nano$0.000060.5 sec
GPT-5 Mini$0.00050.7 sec
GPT-5$0.00131.2 sec
GPT-4.1 Mini$0.000460.67 sec

Winner: GPT-5 Nano. 100x cheaper than GPT-5, 10x faster than GPT-5.

Customer Support Q&A (3K prompt, 500 completion)

ModelCostTime
GPT-5 Mini$0.0027.4 sec
GPT-5$0.005412 sec
GPT-4.1 Mini$0.0026.7 sec

Winner: Tie between GPT-5 Mini and GPT-4.1 Mini. Mini wins on cost, speed is comparable.

Long Document Analysis (100K prompt, 2K completion)

ModelCostTime
GPT-5.1$0.1343 sec
GPT-5.4$0.2844 sec
GPT-4.1$0.41636 sec

Winner: GPT-5.1. Cheaper, nearly same speed, sufficient context.

Code Review (500K prompt, 5K completion)

ModelCostTime
GPT-4.1$1.0491 sec
GPT-5.1$0.65107 sec

Winner: GPT-5.1. 37% cheaper. Slightly slower but worth it.

Hard Math Problem (2K prompt, 8K completion, chain-of-thought)

ModelCostTime
o3$0.082471 sec
o3 Mini$0.045170 sec
GPT-5$0.0125195 sec

Winner: Depends on accuracy needed. o3 is best, o3 Mini balances cost and speed, GPT-5 is fastest and cheapest but least accurate.


Model Selection Guide

Decision Tree

High volume, simple tasks? Start with GPT-5 Nano ($0.05/M prompt). If quality is too low, upgrade to GPT-5 Mini ($0.25/M).

Standard tasks, balanced cost-quality? Use GPT-5 ($1.25/$10). This is the default unless a specific need pushes teams elsewhere.

Long documents (over 100K tokens)? Use GPT-5.1 (400K context, $1.25/M). Cheaper and better than GPT-4.1.

Extremely long documents (500K+ tokens)? Use GPT-4.1 (1.05M context, $2/M). Only option, but pricey.

Hard reasoning or novel problems? Use o3 ($2/$8 per M). Slow, but worth it for accuracy. If cost is tight, try o3 Mini ($1.10/$4.40) first.

Code tasks? Use GPT-5 Codex (same price as GPT-5.1 but optimized). Or just use GPT-5, it's good at code.

Avoid: GPT-4o, GPT-4o Mini, GPT-4.1 Nano. Superseded by GPT-5 series. No reason to use them.


Throughput Considerations

Throughput affects latency and real-world cost.

GPT-5 Nano (95 tok/s): 1,000-token completion in 10.5 seconds. Fast. GPT-5 (41 tok/s): 1,000-token completion in 24 seconds. Slower. o3 (17 tok/s): 1,000-token completion in 59 seconds. Very slow.

For user-facing applications, TTFT (time-to-first-token) matters as much as throughput. OpenAI doesn't publish TTFT, but it generally correlates with throughput. Faster models = lower TTFT.

If latency is critical: Use GPT-5 Mini (68 tok/s) or Nano (95 tok/s). If teams have time: Use o3 for hard problems.


FAQ

What's the cheapest model for customer-facing tasks?

GPT-5 Mini. $0.25/$2 per million tokens. 10x cheaper than GPT-5. Quality is 85-90% of GPT-5. Good for: Q&A, summarization, categorization.

Should I still use GPT-4.1?

Only if you need 1M context. Otherwise, use GPT-5 ($1.25/M prompt, half the cost, better quality). GPT-4.1 is legacy.

Is o3 worth the cost?

For competition math, novel research, logic puzzles: yes. For customer support or text generation: no.

What's the throughput difference between o3 and GPT-5?

o3: 17 tok/s. GPT-5: 41 tok/s. o3 is 2.4x slower but better at reasoning. For routine tasks, GPT-5 is fine.

Can I use GPT-5 Nano for everything?

No. Nano is low-quality (similar to GPT-3.5). Good for classification, tagging, routing. Not for content creation, code generation, or detailed analysis.

Which model should I use as my default?

GPT-5 ($1.25/$10). Best balance of cost, quality, and speed. If tasks are simple and volume is high, upgrade decision tree. If tasks are hard, consider o3.

Is GPT-5 better than Claude?

Comparable. GPT-5 is faster. Claude Sonnet 4.6 is $3/$15 per M tokens (more expensive). Each has different strengths: GPT-5 for code, Claude for nuance. Test both.


Throughput and Latency Implications

Pricing per token is one lens. Throughput per dollar is another.

Cost Per Task (Practical Examples)

Email classification (subject line, mark as spam/not spam):

  • Prompt: 200 tokens (email + instructions)
  • Completion: 5 tokens (spam/not-spam decision)
  • Model: GPT-5 Nano
  • Cost: (200 × $0.05 + 5 × $0.40) / 1M = $0.000011
  • Speed: 95 tok/s, completes in ~2 seconds
  • Monthly cost for 1M emails: $11

Blog post summarization (1,500 word article → 200 word summary):

  • Prompt: 3,500 tokens (article + summary instruction)
  • Completion: 500 tokens (summary)
  • Model: GPT-5 Mini
  • Cost: (3,500 × $0.25 + 500 × $2) / 1M = $0.001513
  • Speed: 68 tok/s completion, ~7 seconds total
  • Monthly cost for 1,000 summaries: $1.51

Detailed code review (entire file + guidelines):

  • Prompt: 8,000 tokens (code + review rubric)
  • Completion: 2,000 tokens (detailed review feedback)
  • Model: GPT-5
  • Cost: (8,000 × $1.25 + 2,000 × $10) / 1M = $0.030
  • Speed: 41 tok/s, ~50 seconds total
  • Monthly cost for 100 reviews: $3.00

Novel research problem-solving (new algorithm from scratch):

  • Prompt: 5,000 tokens (problem description, constraints, examples)
  • Completion: 5,000 tokens (novel algorithm with explanation)
  • Model: o3 (reasoning model)
  • Cost: (5,000 × $2 + 5,000 × $8) / 1M = $0.05
  • Speed: 17 tok/s, ~6 minutes (slow but accurate)
  • Cost per problem: $0.05 (expensive but worth it if solution is correct first try)

API Rate Limits by Model

OpenAI enforces rate limits (requests per minute, tokens per minute) based on pricing tier.

ModelRequests/minTokens/minNotes
GPT-5 Nano3,5002MFree tier
GPT-5 Mini3,5001MBasic tier
GPT-53,500500KStandard tier
GPT-4.11,500300KLegacy
o3100100KReasoning limited

o3 has aggressive rate limits due to cost. Can't burst 100M tokens/hour on o3.

For high-volume tasks (1B+ tokens/day), you need:

  1. Multiple API keys (different rate limit buckets)
  2. Queue + batch processing (batch API has 1.5x cost savings but processes asynchronously)
  3. Fallback to GPT-5 Mini/Nano when o3 hits limit

Batch API: 50% Cheaper

OpenAI offers a batch API: submit 10,000+ requests at once, receive results in 1-24 hours.

Cost reduction: 50% for all models. So GPT-5 Nano becomes $0.025/M prompt, $0.20/M completion.

Trade: latency. Instead of 5-second response, wait 1-24 hours.

Viable for:

  • Non-urgent tasks (data labeling, content generation, analysis)
  • Overnight processing
  • Research batches

Not viable for:

  • Customer-facing APIs (users expect immediate response)
  • Interactive tools

For 1B daily tokens via batch API: cost drops from ~$2 to ~$1. Annual savings: $360K for large teams.


Hybrid Approach: Multi-Model Strategy

Smart teams don't pick one model. They use different models for different tasks:

TaskModelReasoning
ClassificationGPT-5 NanoCheapest, classification is simple
SummarizationGPT-5 MiniBalance of cost and quality
Content creationGPT-5Best quality for text
Code generationGPT-5 CodexOptimized for code
Long documentsGPT-5.1400K context, reasonable cost
Hard reasoningo3Best accuracy for novel problems

Example: Customer support AI.

  • Route incoming support tickets: GPT-5 Nano (classify priority, department)
  • Generate first-pass response: GPT-5 Mini (fast, good enough)
  • Hand-off to human if complexity flagged: Check with GPT-5 (full analysis)

Cost per ticket: mostly Nano + Mini (cheap), rarely GPT-5 (expensive).


Historical pattern: New models expensive, drop 50-70% in 12 months.

GPT-4o launch price: $2.50/$10 per M tokens GPT-4o price today (March 2026): Legacy tier, avoid

GPT-4.1 launch price (2024): $2/$8 per M tokens GPT-4.1 price today: Still $2/$8 (no reduction yet)

GPT-5 launch price (Feb 2026): $1.25/$10 per M tokens GPT-5 price today (March 2026): Still $1.25/$10

Prediction: GPT-5 pricing will drop to $0.75/$6 by Q4 2026. o3 will drop to $1/$4 by Q2 2026.

If you're in early development, use GPT-5 Nano/Mini to lock in the habit. When prices drop, your cost scales down further.

If you're already at production scale, locking in volume discounts (not public-facing, available via sales team) now is smart. Lock at $1.25/$10, save when public pricing drops.


Common Pricing Mistakes

Mistake 1: Using GPT-5.4 for Everything

GPT-5.4 is $2.50/$15 per M tokens. 2x the prompt cost of GPT-5, 1.5x the completion cost.

Best use: Complex reasoning with large documents.

Wrong use: Email replies (GPT-5 Mini sufficient). Blog posts (GPT-5 fine). Data entry (GPT-5 Nano overkill).

Monthly cost difference: 1M tokens on GPT-5.4 vs GPT-5 Mini = $2.50 + $15 vs $0.25 + $2 = $14.25 extra per million tokens.

Mistake 2: Not Using Batch API

Batch API is 50% cheaper. If 50% of your workload is non-urgent, using batch saves money.

Example: Labeling 10M documents for training data.

  • Non-batched (immediate): 10M tokens × $1.25 (GPT-5) = $12.50
  • Batched (overnight): 10M tokens × $1.25 × 0.5 = $6.25
  • Savings: $6.25 per 10M tokens

Mistake 3: Retrying Failed Requests Without Caching

If a request fails and you retry, you're charged twice.

Use caching or idempotency to avoid double-charges.

Mistake 4: Choosing by Price Alone

GPT-5 Nano is cheap, but quality is low. If you use Nano for complex tasks and get wrong answers, you waste time fixing them.

Time cost of manual review: $100/hr. Nano cost: $0.0001 per token.

If fixing a Nano mistake costs 30 minutes = $50, and choosing GPT-5 ($0.01 cost) gets it right first time, the ROI is clear.

Choose model based on task complexity, not just price.



Sources