Contents
- Overview
- OpenAI Pricing: Current Model Pricing Table
- Model-by-Model Breakdown
- Batch API & Discounts
- Monthly Cost Projections
- Cost Per Task
- Hidden Fees
- Optimization Strategies
- FAQ
- Related Resources
- Sources
Overview
OpenAI pricing varies wildly across models. GPT-5.4 costs $2.50/$15 per million tokens (prompt/completion). GPT-4.1 costs $2/$8. GPT-5 Mini is $0.25/$2. GPT-5 Pro is $15/$120. As of March 2026, teams can spend $10/month or $10,000/month depending on model choice and usage. Hidden fees exist: no charges for context window size (pay per token regardless), but there are overage limits and batch-request minimum spend. This guide breaks down every tier and shows real cost projections.
OpenAI Pricing: Current Model Pricing Table
| Model | Prompt $/M | Completion $/M | Context | Max Output | Throughput |
|---|---|---|---|---|---|
| GPT-5.4 | $2.50 | $15.00 | 272K | 128K | 45 tok/sec |
| GPT-5.1 | $1.25 | $10.00 | 400K | 128K | 47 tok/sec |
| GPT-5 Codex | $1.25 | $10.00 | 400K | 128K | 50 tok/sec |
| GPT-5 Pro | $15.00 | $120.00 | 400K | 128K | 11 tok/sec |
| GPT-5 | $1.25 | $10.00 | 272K | 128K | 41 tok/sec |
| GPT-5 Mini | $0.25 | $2.00 | 272K | 128K | 68 tok/sec |
| GPT-5 Nano | $0.05 | $0.40 | 272K | 32K | 95 tok/sec |
| GPT-4.1 | $2.00 | $8.00 | 1.05M | 32K | 55 tok/sec |
| GPT-4.1 Mini | $0.40 | $1.60 | 1.05M | 32K | 75 tok/sec |
| GPT-4.1 Nano | $0.10 | $0.40 | 1.05M | 32K | 82 tok/sec |
| GPT-4o | $2.50 | $10.00 | 128K | 16K | 52 tok/sec |
| GPT-4o Mini | $0.15 | $0.60 | 128K | 16K | 75 tok/sec |
| o3 | $2.00 | $8.00 | 200K | 100K | 17 tok/sec |
| o3 Mini | $1.10 | $4.40 | 200K | 100K | 47 tok/sec |
| o4 Mini | $1.10 | $4.40 | 200K | 100K | 62 tok/sec |
Data from OpenAI pricing page, March 2026. Prices per 1 million tokens.
Model-by-Model Breakdown
GPT-5 Series (Current Generation)
GPT-5.4 ($2.50/$15)
Latest variant. Advertised as "reasoning-first." 272K context. Max output 128K tokens (large artifacts possible). Throughput 45 tok/sec is moderate.
Use case: Reasoning tasks (analysis, planning, multi-step logic). Best quality for benchmarks. Higher cost due to the "reasoning" branding.
Monthly cost for 100M tokens: $250 prompt + $150 completion = $400 (assuming 50/50 prompt/completion split).
GPT-5.1 ($1.25/$10)
Previous variant, cheaper. 400K context (largest window in the 5-series). Max output 128K. Throughput 47 tok/sec.
Use case: Cost-sensitive reasoning. Context size is a significant advantage (100K more than GPT-5.4). Slower throughput than GPT-5 or Mini, but not catastrophic.
Monthly cost for 100M tokens: $125 prompt + $100 completion = $225.
GPT-5 Codex ($1.25/$10)
Specialized variant for code generation. Same pricing as GPT-5.1. 400K context. Max output 128K. Throughput 50 tok/sec (fastest in the 5-series).
Use case: Code generation, refactoring, technical documentation. Scoring higher on coding benchmarks than GPT-5.1 (claimed, not published).
Monthly cost same as GPT-5.1: $225 for 100M tokens.
GPT-5 Pro ($15/$120)
Experimental reasoning model. Glacially slow (11 tok/sec). Max output 128K. Not for production use.
Use case: Research, one-off complex reasoning. Cost-prohibitive for any at-scale usage.
Monthly cost: For 100M tokens, $1,500 + $1,200 = $2,700. Unsustainable for high-volume applications.
GPT-5 ($1.25/$10)
Standard model. No specialist positioning. 272K context. Max output 128K. Throughput 41 tok/sec (slower than Mini).
Use case: General-purpose chat, reasoning. Cheaper than GPT-5.4. Slower than GPT-5 Mini. Middle-ground option for cost-conscious teams.
Monthly cost for 100M tokens: $225.
GPT-5 Mini ($0.25/$2)
Lightweight model. Fast (68 tok/sec). 272K context. Max output 128K.
Use case: High-volume inference, classification, summarization. 10x cheaper than GPT-5. Quality is lower on reasoning, but adequate for non-reasoning tasks.
Monthly cost for 100M tokens: $25 prompt + $20 completion = $45.
GPT-5 Nano ($0.05/$0.40)
Ultra-lightweight. Fastest throughput (95 tok/sec). Smallest context (272K). Limited output (32K max).
Use case: Extreme cost minimization. Token classification, simple summarization. Quality is minimal; expect hallucinations on reasoning tasks.
Monthly cost for 100M tokens: $5 prompt + $4 completion = $9.
GPT-4.1 Series (Mature, Stable)
GPT-4.1 ($2/$8)
Full model. Largest context window (1.05M tokens). Max output 32K. Throughput 55 tok/sec (fastest production model).
Use case: Long-context applications, large document analysis, code refactoring. Cost is 20% lower than GPT-5.4 but context is 3.8x larger. Throughput is superior.
Monthly cost for 100M tokens: $200 + $80 = $280. For context-heavy workloads, better value than GPT-5.4 despite higher per-token cost (amortized over larger batches).
GPT-4.1 Mini ($0.40/$1.60)
Lightweight variant. Same 1.05M context as GPT-4.1. Max output 32K. Throughput 75 tok/sec (faster).
Use case: High-volume document processing where speed matters more than quality. Cost is 5x lower than GPT-4.1 but context stays 1.05M.
Monthly cost for 100M tokens: $40 + $16 = $56.
GPT-4.1 Nano ($0.10/$0.40)
Ultra-cheap variant. Same context window (1.05M). Max output 32K. Throughput 82 tok/sec.
Use case: Extreme cost minimization with large context needed. Rare use case: processing very long documents very cheaply.
Monthly cost for 100M tokens: $10 + $4 = $14.
GPT-4o (Previous Generation)
GPT-4o ($2.50/$10)
Older standard model. 128K context (smaller than GPT-4.1). Max output 16K. Throughput 52 tok/sec.
Deprecated positioning but still widely available. Cost-per-token is between GPT-4.1 and GPT-5.4. Context is smaller than both.
Monthly cost for 100M tokens: $250 + $100 = $350.
GPT-4o Mini ($0.15/$0.60)
Lightweight variant of GPT-4o. 128K context. Max output 16K. Throughput 75 tok/sec.
Use case: Cost-sensitive applications that don't need 1M context. 10x cheaper than GPT-4o.
Monthly cost for 100M tokens: $15 + $6 = $21.
Reasoning Models (o3, o4)
o3 ($2/$8)
Reasoning-specialized model. Slow (17 tok/sec). 200K context. Max output 100K.
Use case: Complex reasoning, multi-step logic, proof-based tasks. Cost is same as GPT-4.1 but throughput is 3x worse. Trade latency for reasoning quality.
Monthly cost for 100M tokens: $200 + $80 = $280.
o3 Mini ($1.10/$4.40)
Faster reasoning variant. 47 tok/sec. Same context and output limits.
Use case: Reasoning tasks where latency matters more than max quality. 3x cheaper than o3, 2.8x faster throughput.
Monthly cost for 100M tokens: $110 + $44 = $154.
o4 Mini ($1.10/$4.40)
Latest reasoning variant. 62 tok/sec (faster than o3 Mini). Same pricing and context limits.
Use case: Same as o3 Mini, but newer model. Presumably better reasoning quality, but benchmarks not published yet.
Monthly cost for 100M tokens: $110 + $44 = $154.
Batch API & Discounts
Batch API: 50% Discount
OpenAI's Batch API processes requests asynchronously with 50% discount. Submit jobs, wait 24 hours (or less during low-demand periods), retrieve results.
Pricing with Batch API:
| Model | Regular | Batch (50% off) |
|---|---|---|
| GPT-5.4 | $2.50/$15 | $1.25/$7.50 |
| GPT-5.1 | $1.25/$10 | $0.625/$5.00 |
| GPT-4.1 | $2/$8 | $1/$4 |
| GPT-5 Mini | $0.25/$2 | $0.125/$1.00 |
| o3 | $2/$8 | $1/$4 |
Example cost reduction:
Process 1 billion tokens of classification tasks. Typical prompt/completion split: 80/20 (800M prompt, 200M completion).
Using GPT-5 Mini on-demand: (800M × $0.25 + 200M × $2) / 1M = $200 + $400 = $600.
Using Batch API: (800M × $0.125 + 200M × $1) / 1M = $100 + $200 = $300.
Savings: $300 (50% discount confirmed).
Batch API: Practical Limits
- Minimum batch size: 10 requests (many teams don't hit this).
- Latency: 24-hour expected SLA (can be much faster, but not guaranteed).
- Use cases: Document processing, periodic data analysis, non-interactive tasks.
- Not suitable for: Real-time chat, customer-facing features, low-latency requirements.
Monthly Cost Projections
Scenario 1: Chatbot (1M Monthly Active Users, 10 messages/user/month)
Metrics:
- Messages: 1M users × 10 = 10M messages/month
- Avg prompt: 100 tokens (user message + context)
- Avg completion: 200 tokens (model response)
- Total: 3B tokens/month (1B prompt, 2B completion)
Using GPT-4o:
- Cost: (1B × $2.50 + 2B × $10) / 1M = $2,500 + $20,000 = $22,500/month
Using GPT-5 Mini:
- Cost: (1B × $0.25 + 2B × $2) / 1M = $250 + $4,000 = $4,250/month
Using GPT-4.1 Mini:
- Cost: (1B × $0.40 + 2B × $1.60) / 1M = $400 + $3,200 = $3,600/month
Winner: GPT-4.1 Mini at $3,600/month. Context window (1.05M) supports longer conversation history without quality loss compared to Mini models.
Scenario 2: Document Processing (10M Documents, Batch Mode)
Metrics:
- Documents: 10M, average 5K tokens each
- Task: Extract key facts, classify document type
- Prompt: 5K tokens (full document)
- Completion: 200 tokens (extraction + classification)
- Total: 50B tokens/month (50B prompt, 2B completion)
Using GPT-4.1 on-demand:
- Cost: (50B × $2 + 2B × $8) / 1M = $100,000 + $16,000 = $116,000/month
Using GPT-4.1 Batch API (50% discount):
- Cost: (50B × $1 + 2B × $4) / 1M = $50,000 + $8,000 = $58,000/month
Using GPT-4.1 Mini Batch (ultra-cheap):
- Cost: (50B × $0.20 + 2B × $0.80) / 1M = $10,000 + $1,600 = $11,600/month
Winner: GPT-4.1 Mini Batch at $11,600/month. For non-interactive batch work, Mini models are sufficient and dramatically cheaper.
Scenario 3: Code Generation Tool (50K Engineers, 5 requests/day/engineer)
Metrics:
- Requests: 50K × 5 × 30 = 7.5M requests/month
- Avg request: 2K prompt (code context), 500 completion (generated code)
- Total: 15B prompt, 3.75B completion tokens/month
Using GPT-5 Codex:
- Cost: (15B × $1.25 + 3.75B × $10) / 1M = $18,750 + $37,500 = $56,250/month
Using GPT-4.1 Mini:
- Cost: (15B × $0.40 + 3.75B × $1.60) / 1M = $6,000 + $6,000 = $12,000/month
Using GPT-5 Mini:
- Cost: (15B × $0.25 + 3.75B × $2) / 1M = $3,750 + $7,500 = $11,250/month
Winner: GPT-5 Mini at $11,250/month. Throughput is higher (68 tok/sec vs 75 for GPT-4.1 Mini), and cost is lower. Coding quality is lower than Codex, but for an IDE autocomplete tool, Mini is acceptable.
Cost Per Task
Fine-Tune a 7B Model Using Claude Code vs. Using OpenAI's API
Scenario: Generate 100K training examples using an LLM, then train. Not using fine-tuning; using generation + training.
Step 1: Generate 100K Examples
Each example: 200 tokens prompt (template), 300 tokens completion (example).
Total: 50B tokens (10B prompt, 40B completion).
Using GPT-4.1:
- Cost: (10B × $2 + 40B × $8) / 1M = $20,000 + $320,000 = $340,000
Using GPT-5 Mini (fast, cheap):
- Cost: (10B × $0.25 + 40B × $2) / 1M = $2,500 + $80,000 = $82,500
Using Claude Code + Claude API:
- Claude Sonnet 4.6: (10B × $3 + 40B × $15) / 1M = $30,000 + $600,000 = $630,000
Winner: GPT-5 Mini at $82,500. OpenAI is cheaper for bulk generation due to Mini's low completion pricing.
Hidden Fees
Explicit Costs (No Surprises)
- Prompt/completion tokens are metered per request. No minimum spend.
- Context window size does not add cost. 272K token context costs same as 1K token context (pay per token used, not per capacity).
- Concurrency limits are not charged; they're soft rate limits (requests rejected if exceeded, not billed extra).
Soft Limits (Not Fees, but Financial Impact)
-
Rate Limits: Free tier and low-usage accounts get strict rate limits (e.g., 3 requests/minute on GPT-4o). Hitting limit means request rejection, not overage charge, but it impacts throughput.
-
Overage Quotas: High-volume accounts (>$100K/month) may require manual negotiation for higher concurrency. No extra fee, but scaling requires account manager contact. Plan ahead.
-
Batch API Minimum: Batch requests must be submitted in jobs of 10+ requests. If processing individual small requests through Batch API, artificial minimum cost per request.
-
Text Embedding: Embedding models (GPT-4 Vision, Text Embedding 3) have separate pricing not reflected in token pricing. Vision input costs more per image than per token.
No Hidden Charges
- No fees for API calls themselves (only token charges).
- No overage surprises (token pricing is fixed).
- No inactivity fees (unused API keys incur no cost).
- No data retention fees (logs are stored for auditing, not billed).
Optimization Strategies
1. Route by Task Complexity
Use different models for different request types:
- Classification: GPT-5 Nano or Mini ($0.05-$0.25 prompt)
- General chat: GPT-5 or GPT-4.1 ($1.25-$2 prompt)
- Reasoning: o3 Mini or GPT-5.4 ($1.10-$2.50 prompt)
- Code generation: GPT-5 Codex ($1.25 prompt)
Cost savings: 10-20x by routing simple tasks to cheap models.
Example: If 50% of requests are classification (could use Nano), 30% are chat (GPT-5), 20% are reasoning (o3 Mini):
Average cost per request: 0.5 × $0.05 + 0.3 × $1.25 + 0.2 × $1.10 = $0.025 + $0.375 + $0.22 = $0.62
vs. using GPT-4o for all: $2.50 (4x more expensive).
2. Use Batch API for Non-Real-Time Work
50% discount on all models. If latency tolerance is 24 hours, always use Batch API.
Savings: 50% on the base token cost.
3. Reduce Prompt Tokens via Caching
Include only essential context. Remove redundant examples. Use retrieval-augmented generation (RAG) to inject context only when needed.
Savings: 20-40% prompt token reduction possible with aggressive caching.
4. Choose GPT-4.1 Over GPT-4o for Large Contexts
GPT-4.1: $2/$8, 1.05M context GPT-4o: $2.50/$10, 128K context
For processing documents >128K tokens, GPT-4.1 is cheaper despite higher per-token cost (can batch multiple documents).
Savings: 20-30% for context-heavy workloads.
5. Prefer Mini Models When Reasoning Isn't Critical
GPT-5 Mini: $0.25/$2 (100x cheaper than o3) o3 Mini: $1.10/$4.40 (2x cheaper than o3)
Most applications don't need maximum reasoning quality.
Savings: 50-90% depending on application.
FAQ
Should teams migrate from GPT-4o to GPT-4.1?
Yes, if context size or throughput matters. GPT-4.1 is 20% cheaper per token ($2/$8 vs $2.50/$10), has 8x larger context (1.05M vs 128K), and is 6% faster (55 vs 52 tok/sec). Only reason to stay on GPT-4o: wider third-party integration (some platforms haven't added GPT-4.1 yet).
What's the real cost of a typical API request?
Depends on model. GPT-5 Mini: $0.05 prompt + $0.10 completion = $0.15 average per request (assuming 200 token response). GPT-4.1: $0.004 prompt + $0.016 completion = $0.02 per typical request. Real costs are cents per request for most applications.
Is o3 worth the cost?
Only for reasoning-heavy tasks where output quality directly impacts revenue or safety. For chat, classification, or code generation: no. For novel problem solving or multi-step logic: maybe. Benchmarks on reasoning tasks are not published, so ROI is hard to calculate.
When should teams use Batch API?
When latency tolerance exceeds 4 hours. Processing documents overnight, generating training data on a schedule, weekly reports. Avoid for real-time chat, customer-facing features, or <1 hour latency requirements.
What's the cost impact of longer context windows?
None. Pricing is per-token. A 1M-context request that uses only 10K tokens costs the same as a 128K-context request using 10K tokens. The size of the context window doesn't add cost; the tokens you actually use do.
Should teams lock in annual contracts for cost certainty?
OpenAI doesn't offer discounts for annual commitments (unlike AWS). The only discount is Batch API's 50%. For volume customers (>$100K/month), contact sales for custom pricing, but it's negotiated case-by-case.
What happens if API costs exceed budget?
OpenAI enforces soft spending limits (can set max monthly spend in account settings). Requests are rejected if limit is reached, not charged over-limit. Plan quotas carefully to avoid service disruption.
Related Resources
Sources
- OpenAI API Pricing
- OpenAI Batch API Documentation
- OpenAI Model Specifications
- DeployBase LLM Models API (March 22, 2026 snapshot)