OpenAI Pricing Breakdown: Cost Per Token, Model Comparison & Hidden Fees

Overview
OpenAI Pricing: Current Model Pricing Table
Model-by-Model Breakdown
Batch API & Discounts
Monthly Cost Projections
Cost Per Task
Hidden Fees
Optimization Strategies
FAQ
Related Resources
Sources

Overview

OpenAI pricing varies wildly across models. GPT-5.4 costs $2.50/$15 per million tokens (prompt/completion). GPT-4.1 costs $2/$8. GPT-5 Mini is $0.25/$2. GPT-5 Pro is $15/$120. As of March 2026, teams can spend $10/month or $10,000/month depending on model choice and usage. Hidden fees exist: no charges for context window size (pay per token regardless), but there are overage limits and batch-request minimum spend. This guide breaks down every tier and shows real cost projections.

OpenAI Pricing: Current Model Pricing Table

Model	Prompt $/M	Completion $/M	Context	Max Output	Throughput
GPT-5.4	$2.50	$15.00	272K	128K	45 tok/sec
GPT-5.1	$1.25	$10.00	400K	128K	47 tok/sec
GPT-5 Codex	$1.25	$10.00	400K	128K	50 tok/sec
GPT-5 Pro	$15.00	$120.00	400K	128K	11 tok/sec
GPT-5	$1.25	$10.00	272K	128K	41 tok/sec
GPT-5 Mini	$0.25	$2.00	272K	128K	68 tok/sec
GPT-5 Nano	$0.05	$0.40	272K	32K	95 tok/sec
GPT-4.1	$2.00	$8.00	1.05M	32K	55 tok/sec
GPT-4.1 Mini	$0.40	$1.60	1.05M	32K	75 tok/sec
GPT-4.1 Nano	$0.10	$0.40	1.05M	32K	82 tok/sec
GPT-4o	$2.50	$10.00	128K	16K	52 tok/sec
GPT-4o Mini	$0.15	$0.60	128K	16K	75 tok/sec
o3	$2.00	$8.00	200K	100K	17 tok/sec
o3 Mini	$1.10	$4.40	200K	100K	47 tok/sec
o4 Mini	$1.10	$4.40	200K	100K	62 tok/sec

Data from OpenAI pricing page, March 2026. Prices per 1 million tokens.

Model-by-Model Breakdown

GPT-5 Series (Current Generation)

GPT-5.4 ($2.50/$15)

Latest variant. Advertised as "reasoning-first." 272K context. Max output 128K tokens (large artifacts possible). Throughput 45 tok/sec is moderate.

Use case: Reasoning tasks (analysis, planning, multi-step logic). Best quality for benchmarks. Higher cost due to the "reasoning" branding.

Monthly cost for 100M tokens: $250 prompt + $150 completion = $400 (assuming 50/50 prompt/completion split).

GPT-5.1 ($1.25/$10)

Previous variant, cheaper. 400K context (largest window in the 5-series). Max output 128K. Throughput 47 tok/sec.

Use case: Cost-sensitive reasoning. Context size is a significant advantage (100K more than GPT-5.4). Slower throughput than GPT-5 or Mini, but not catastrophic.

Monthly cost for 100M tokens: $125 prompt + $100 completion = $225.

GPT-5 Codex ($1.25/$10)

Specialized variant for code generation. Same pricing as GPT-5.1. 400K context. Max output 128K. Throughput 50 tok/sec (fastest in the 5-series).

Use case: Code generation, refactoring, technical documentation. Scoring higher on coding benchmarks than GPT-5.1 (claimed, not published).

Monthly cost same as GPT-5.1: $225 for 100M tokens.

GPT-5 Pro ($15/$120)

Experimental reasoning model. Glacially slow (11 tok/sec). Max output 128K. Not for production use.

Use case: Research, one-off complex reasoning. Cost-prohibitive for any at-scale usage.

Monthly cost: For 100M tokens, $1,500 + $1,200 = $2,700. Unsustainable for high-volume applications.

GPT-5 ($1.25/$10)

Standard model. No specialist positioning. 272K context. Max output 128K. Throughput 41 tok/sec (slower than Mini).

Use case: General-purpose chat, reasoning. Cheaper than GPT-5.4. Slower than GPT-5 Mini. Middle-ground option for cost-conscious teams.

Monthly cost for 100M tokens: $225.

GPT-5 Mini ($0.25/$2)

Lightweight model. Fast (68 tok/sec). 272K context. Max output 128K.

Use case: High-volume inference, classification, summarization. 10x cheaper than GPT-5. Quality is lower on reasoning, but adequate for non-reasoning tasks.

Monthly cost for 100M tokens: $25 prompt + $20 completion = $45.

GPT-5 Nano ($0.05/$0.40)

Ultra-lightweight. Fastest throughput (95 tok/sec). Smallest context (272K). Limited output (32K max).

Use case: Extreme cost minimization. Token classification, simple summarization. Quality is minimal; expect hallucinations on reasoning tasks.

Monthly cost for 100M tokens: $5 prompt + $4 completion = $9.

GPT-4.1 Series (Mature, Stable)

GPT-4.1 ($2/$8)

Full model. Largest context window (1.05M tokens). Max output 32K. Throughput 55 tok/sec (fastest production model).

Use case: Long-context applications, large document analysis, code refactoring. Cost is 20% lower than GPT-5.4 but context is 3.8x larger. Throughput is superior.

Monthly cost for 100M tokens: $200 + $80 = $280. For context-heavy workloads, better value than GPT-5.4 despite higher per-token cost (amortized over larger batches).

GPT-4.1 Mini ($0.40/$1.60)

Lightweight variant. Same 1.05M context as GPT-4.1. Max output 32K. Throughput 75 tok/sec (faster).

Use case: High-volume document processing where speed matters more than quality. Cost is 5x lower than GPT-4.1 but context stays 1.05M.

Monthly cost for 100M tokens: $40 + $16 = $56.

GPT-4.1 Nano ($0.10/$0.40)

Ultra-cheap variant. Same context window (1.05M). Max output 32K. Throughput 82 tok/sec.

Use case: Extreme cost minimization with large context needed. Rare use case: processing very long documents very cheaply.

Monthly cost for 100M tokens: $10 + $4 = $14.

GPT-4o (Previous Generation)

GPT-4o ($2.50/$10)

Older standard model. 128K context (smaller than GPT-4.1). Max output 16K. Throughput 52 tok/sec.

Deprecated positioning but still widely available. Cost-per-token is between GPT-4.1 and GPT-5.4. Context is smaller than both.

Monthly cost for 100M tokens: $250 + $100 = $350.

GPT-4o Mini ($0.15/$0.60)

Lightweight variant of GPT-4o. 128K context. Max output 16K. Throughput 75 tok/sec.

Use case: Cost-sensitive applications that don't need 1M context. 10x cheaper than GPT-4o.

Monthly cost for 100M tokens: $15 + $6 = $21.

Reasoning Models (o3, o4)

o3 ($2/$8)

Reasoning-specialized model. Slow (17 tok/sec). 200K context. Max output 100K.

Use case: Complex reasoning, multi-step logic, proof-based tasks. Cost is same as GPT-4.1 but throughput is 3x worse. Trade latency for reasoning quality.

Monthly cost for 100M tokens: $200 + $80 = $280.

o3 Mini ($1.10/$4.40)

Faster reasoning variant. 47 tok/sec. Same context and output limits.

Use case: Reasoning tasks where latency matters more than max quality. 3x cheaper than o3, 2.8x faster throughput.

Monthly cost for 100M tokens: $110 + $44 = $154.

o4 Mini ($1.10/$4.40)

Latest reasoning variant. 62 tok/sec (faster than o3 Mini). Same pricing and context limits.

Use case: Same as o3 Mini, but newer model. Presumably better reasoning quality, but benchmarks not published yet.

Monthly cost for 100M tokens: $110 + $44 = $154.

Batch API & Discounts

Batch API: 50% Discount

OpenAI's Batch API processes requests asynchronously with 50% discount. Submit jobs, wait 24 hours (or less during low-demand periods), retrieve results.

Pricing with Batch API:

Model	Regular	Batch (50% off)
GPT-5.4	$2.50/$15	$1.25/$7.50
GPT-5.1	$1.25/$10	$0.625/$5.00
GPT-4.1	$2/$8	$1/$4
GPT-5 Mini	$0.25/$2	$0.125/$1.00
o3	$2/$8	$1/$4

Example cost reduction:

Process 1 billion tokens of classification tasks. Typical prompt/completion split: 80/20 (800M prompt, 200M completion).

Using GPT-5 Mini on-demand: (800M × $0.25 + 200M × $2) / 1M = $200 + $400 = $600.

Using Batch API: (800M × $0.125 + 200M × $1) / 1M = $100 + $200 = $300.

Savings: $300 (50% discount confirmed).

Batch API: Practical Limits

Minimum batch size: 10 requests (many teams don't hit this).
Latency: 24-hour expected SLA (can be much faster, but not guaranteed).
Use cases: Document processing, periodic data analysis, non-interactive tasks.
Not suitable for: Real-time chat, customer-facing features, low-latency requirements.

Monthly Cost Projections

Scenario 1: Chatbot (1M Monthly Active Users, 10 messages/user/month)

Metrics:

Messages: 1M users × 10 = 10M messages/month
Avg prompt: 100 tokens (user message + context)
Avg completion: 200 tokens (model response)
Total: 3B tokens/month (1B prompt, 2B completion)

Using GPT-4o:

Cost: (1B × $2.50 + 2B × $10) / 1M = $2,500 + $20,000 = $22,500/month

Using GPT-5 Mini:

Cost: (1B × $0.25 + 2B × $2) / 1M = $250 + $4,000 = $4,250/month

Using GPT-4.1 Mini:

Cost: (1B × $0.40 + 2B × $1.60) / 1M = $400 + $3,200 = $3,600/month

Winner: GPT-4.1 Mini at $3,600/month. Context window (1.05M) supports longer conversation history without quality loss compared to Mini models.

Scenario 2: Document Processing (10M Documents, Batch Mode)

Metrics:

Documents: 10M, average 5K tokens each
Task: Extract key facts, classify document type
Prompt: 5K tokens (full document)
Completion: 200 tokens (extraction + classification)
Total: 50B tokens/month (50B prompt, 2B completion)

Using GPT-4.1 on-demand:

Cost: (50B × $2 + 2B × $8) / 1M = $100,000 + $16,000 = $116,000/month

Using GPT-4.1 Batch API (50% discount):

Cost: (50B × $1 + 2B × $4) / 1M = $50,000 + $8,000 = $58,000/month

Using GPT-4.1 Mini Batch (ultra-cheap):

Cost: (50B × $0.20 + 2B × $0.80) / 1M = $10,000 + $1,600 = $11,600/month

Winner: GPT-4.1 Mini Batch at $11,600/month. For non-interactive batch work, Mini models are sufficient and dramatically cheaper.

Scenario 3: Code Generation Tool (50K Engineers, 5 requests/day/engineer)

Metrics:

Requests: 50K × 5 × 30 = 7.5M requests/month
Avg request: 2K prompt (code context), 500 completion (generated code)
Total: 15B prompt, 3.75B completion tokens/month

Using GPT-5 Codex:

Cost: (15B × $1.25 + 3.75B × $10) / 1M = $18,750 + $37,500 = $56,250/month

Using GPT-4.1 Mini:

Cost: (15B × $0.40 + 3.75B × $1.60) / 1M = $6,000 + $6,000 = $12,000/month

Using GPT-5 Mini:

Cost: (15B × $0.25 + 3.75B × $2) / 1M = $3,750 + $7,500 = $11,250/month

Winner: GPT-5 Mini at $11,250/month. Throughput is higher (68 tok/sec vs 75 for GPT-4.1 Mini), and cost is lower. Coding quality is lower than Codex, but for an IDE autocomplete tool, Mini is acceptable.

Cost Per Task

Fine-Tune a 7B Model Using Claude Code vs. Using OpenAI's API

Scenario: Generate 100K training examples using an LLM, then train. Not using fine-tuning; using generation + training.

Step 1: Generate 100K Examples

Each example: 200 tokens prompt (template), 300 tokens completion (example).

Total: 50B tokens (10B prompt, 40B completion).

Using GPT-4.1:

Cost: (10B × $2 + 40B × $8) / 1M = $20,000 + $320,000 = $340,000

Using GPT-5 Mini (fast, cheap):

Cost: (10B × $0.25 + 40B × $2) / 1M = $2,500 + $80,000 = $82,500

Using Claude Code + Claude API:

Claude Sonnet 4.6: (10B × $3 + 40B × $15) / 1M = $30,000 + $600,000 = $630,000

Winner: GPT-5 Mini at $82,500. OpenAI is cheaper for bulk generation due to Mini's low completion pricing.

Hidden Fees

Explicit Costs (No Surprises)

Prompt/completion tokens are metered per request. No minimum spend.
Context window size does not add cost. 272K token context costs same as 1K token context (pay per token used, not per capacity).
Concurrency limits are not charged; they're soft rate limits (requests rejected if exceeded, not billed extra).

Soft Limits (Not Fees, but Financial Impact)

Rate Limits: Free tier and low-usage accounts get strict rate limits (e.g., 3 requests/minute on GPT-4o). Hitting limit means request rejection, not overage charge, but it impacts throughput.
Overage Quotas: High-volume accounts (>$100K/month) may require manual negotiation for higher concurrency. No extra fee, but scaling requires account manager contact. Plan ahead.
Batch API Minimum: Batch requests must be submitted in jobs of 10+ requests. If processing individual small requests through Batch API, artificial minimum cost per request.
Text Embedding: Embedding models (GPT-4 Vision, Text Embedding 3) have separate pricing not reflected in token pricing. Vision input costs more per image than per token.

No Hidden Charges

No fees for API calls themselves (only token charges).
No overage surprises (token pricing is fixed).
No inactivity fees (unused API keys incur no cost).
No data retention fees (logs are stored for auditing, not billed).

Optimization Strategies

1. Route by Task Complexity

Use different models for different request types:

Classification: GPT-5 Nano or Mini ($0.05-$0.25 prompt)
General chat: GPT-5 or GPT-4.1 ($1.25-$2 prompt)
Reasoning: o3 Mini or GPT-5.4 ($1.10-$2.50 prompt)
Code generation: GPT-5 Codex ($1.25 prompt)

Cost savings: 10-20x by routing simple tasks to cheap models.

Example: If 50% of requests are classification (could use Nano), 30% are chat (GPT-5), 20% are reasoning (o3 Mini):

Average cost per request: 0.5 × $0.05 + 0.3 × $1.25 + 0.2 × $1.10 = $0.025 + $0.375 + $0.22 = $0.62

vs. using GPT-4o for all: $2.50 (4x more expensive).

2. Use Batch API for Non-Real-Time Work

50% discount on all models. If latency tolerance is 24 hours, always use Batch API.

Savings: 50% on the base token cost.

3. Reduce Prompt Tokens via Caching

Include only essential context. Remove redundant examples. Use retrieval-augmented generation (RAG) to inject context only when needed.

Savings: 20-40% prompt token reduction possible with aggressive caching.

4. Choose GPT-4.1 Over GPT-4o for Large Contexts

GPT-4.1: $2/$8, 1.05M context GPT-4o: $2.50/$10, 128K context

For processing documents >128K tokens, GPT-4.1 is cheaper despite higher per-token cost (can batch multiple documents).

Savings: 20-30% for context-heavy workloads.

5. Prefer Mini Models When Reasoning Isn't Critical

GPT-5 Mini: $0.25/$2 (100x cheaper than o3) o3 Mini: $1.10/$4.40 (2x cheaper than o3)

Most applications don't need maximum reasoning quality.

Savings: 50-90% depending on application.

FAQ

Should teams migrate from GPT-4o to GPT-4.1?

Yes, if context size or throughput matters. GPT-4.1 is 20% cheaper per token ($2/$8 vs $2.50/$10), has 8x larger context (1.05M vs 128K), and is 6% faster (55 vs 52 tok/sec). Only reason to stay on GPT-4o: wider third-party integration (some platforms haven't added GPT-4.1 yet).

What's the real cost of a typical API request?

Depends on model. GPT-5 Mini: $0.05 prompt + $0.10 completion = $0.15 average per request (assuming 200 token response). GPT-4.1: $0.004 prompt + $0.016 completion = $0.02 per typical request. Real costs are cents per request for most applications.

Is o3 worth the cost?

Only for reasoning-heavy tasks where output quality directly impacts revenue or safety. For chat, classification, or code generation: no. For novel problem solving or multi-step logic: maybe. Benchmarks on reasoning tasks are not published, so ROI is hard to calculate.

When should teams use Batch API?

When latency tolerance exceeds 4 hours. Processing documents overnight, generating training data on a schedule, weekly reports. Avoid for real-time chat, customer-facing features, or <1 hour latency requirements.

What's the cost impact of longer context windows?

None. Pricing is per-token. A 1M-context request that uses only 10K tokens costs the same as a 128K-context request using 10K tokens. The size of the context window doesn't add cost; the tokens you actually use do.

Should teams lock in annual contracts for cost certainty?

OpenAI doesn't offer discounts for annual commitments (unlike AWS). The only discount is Batch API's 50%. For volume customers (>$100K/month), contact sales for custom pricing, but it's negotiated case-by-case.

What happens if API costs exceed budget?

OpenAI enforces soft spending limits (can set max monthly spend in account settings). Requests are rejected if limit is reached, not charged over-limit. Plan quotas carefully to avoid service disruption.

Sources

OpenAI API Pricing
OpenAI Batch API Documentation
OpenAI Model Specifications
DeployBase LLM Models API (March 22, 2026 snapshot)

Contents

Overview

OpenAI Pricing: Current Model Pricing Table

Model-by-Model Breakdown

GPT-5 Series (Current Generation)

GPT-4.1 Series (Mature, Stable)

GPT-4o (Previous Generation)

Reasoning Models (o3, o4)

Batch API & Discounts

Batch API: 50% Discount

Batch API: Practical Limits

Monthly Cost Projections

Scenario 1: Chatbot (1M Monthly Active Users, 10 messages/user/month)

Scenario 2: Document Processing (10M Documents, Batch Mode)

Scenario 3: Code Generation Tool (50K Engineers, 5 requests/day/engineer)

Cost Per Task

Fine-Tune a 7B Model Using Claude Code vs. Using OpenAI's API

Hidden Fees

Explicit Costs (No Surprises)

Soft Limits (Not Fees, but Financial Impact)

No Hidden Charges

Optimization Strategies

1. Route by Task Complexity

2. Use Batch API for Non-Real-Time Work

3. Reduce Prompt Tokens via Caching

4. Choose GPT-4.1 Over GPT-4o for Large Contexts

5. Prefer Mini Models When Reasoning Isn't Critical

FAQ

Related Resources

Sources