Contents
- Claude API Pricing: Overview
- Pricing by Model
- Model Lineup and Capabilities
- Context Window Pricing
- Token Optimization Strategies
- Extended Thinking and Advanced Features
- Cost Comparison: Model Selection
- Monthly Cost Projections
- Prompt Caching Deep Dive
- Batch API for Cost Reduction
- Production Deployment Costs
- Use Case Recommendations
- FAQ
- Related Resources
- Sources
Claude API Pricing: Overview
Claude API pricing spans $0.25 to $15 per million input tokens, depending on model and context window. Opus 4.6 and Sonnet 4.6 eliminated long-context surcharges, making 1M contexts available at standard rates. Prompt caching cuts costs 90% on cached input. Batch processing drops prices further with 24-hour windows. Current as of March 21, 2026.
Pricing by Model
| Model | Context | Input $/M | Output $/M | Throughput | Max Output |
|---|---|---|---|---|---|
| Claude Opus 4.6 | 1M | $5.00 | $25.00 | 35 tok/s | 128K |
| Claude Sonnet 4.6 | 1M | $3.00 | $15.00 | 37 tok/s | 64K |
| Claude Sonnet 4.5 | 1M | $3.00 | $15.00 | 36 tok/s | 64K |
| Claude Sonnet 4 | 1M | $3.00 | $15.00 | 42 tok/s | 64K |
| Claude Opus 4.5 | 200K | $5.00 | $25.00 | 39 tok/s | 64K |
| Claude Opus 4 | 200K | $15.00 | $75.00 | 29 tok/s | 32K |
| Claude Opus 4.1 | 200K | $15.00 | $75.00 | 21 tok/s | 32K |
| Claude Haiku 4.5 | 200K | $1.00 | $5.00 | 44 tok/s | 64K |
| Claude 3 Haiku | 200K | $0.25 | $1.25 | 40 tok/s | 4K |
Pricing as of March 21, 2026. All prices per million tokens.
Model Lineup and Capabilities
Opus 4.6: The Latest Flagship
Strongest reasoning and most capable. Best for complex analysis, code generation on difficult problems, multi-step reasoning, and expert-level tasks. 1M context window (5x larger than previous Opus). Throughput: 35 tokens/second means a 128K output takes about 1 hour.
Cost calculation: A request with 100K context + 10K completion costs approximately $0.50 + $0.25 = $0.75.
Opus 4.6 works for research teams, high-stakes use cases, workloads where reasoning justifies the cost. Complex problem solving. Scientific analysis. Detailed code reviews. Multi-step debugging. When accuracy matters more than speed or cost, Opus wins.
When to use Opus 4.6:
- Complex reasoning required (legal analysis, scientific papers, multi-step logic)
- Large context requirements (need to fit 500K+ tokens in single request)
- Mission-critical applications (financial decisions, medical analysis)
- Research and R&D (answer quality is priority)
Sonnet 4.6: The Balanced Choice
Better speed than Opus, still strong reasoning. Best for chatbots, code completions, and real-time applications. 1M context window (matches Opus). Throughput: 37 tokens/second (slightly faster than Opus). Cost: $3/$15 per M tokens.
Cheaper than Opus and only slightly slower on reasoning. Most API users pick Sonnet 4.6 for production. Not too fast, not too smart, just right for shipping products in real-time.
When to use Sonnet 4.6:
- Production chatbots and Q&A systems
- Real-time applications (< 500ms latency required)
- Moderate reasoning requirements (not latest complexity)
- High-volume serving (cost per token matters)
Haiku 4.5: The Speed Tier
Fastest model. Best for high-volume processing, simple tasks, classification, and cost-constrained applications. 200K context window (enough for most documents). Throughput: 44 tokens/second (fastest tier). Cost: $1/$5 per M tokens.
5x cheaper than Sonnet on input, useful for passing large documents through multiple completions. A 500K token document summary costs $0.50 on Haiku vs $1.50 on Sonnet.
For classification, tagging, and other high-volume, low-complexity tasks, Haiku's speed and cost are a clear win. Error rate on Haiku is 2-3% higher than Sonnet on reasoning tasks, but acceptable for tasks with downstream validation.
When to use Haiku 4.5:
- High-volume batch processing (tagging, classification)
- Cost-constrained applications (startups, non-profits)
- Simple tasks with acceptable error rates (5%+)
- Throughput-critical applications (need fast responses)
Legacy Models (Avoid)
Claude 3 Haiku, Opus 4, Opus 4.1 still available but deprecated. Opus 4.1 costs $15/$75 per M tokens (3x the price of Opus 4.6). Use current models instead.
Migration economics: Migrating from Opus 4.1 to Opus 4.6 is free in terms of API compatibility and saves 67% on costs.
Context Window Pricing
CRITICAL: Anthropic eliminated long-context surcharges in 2026. All context tokens cost the same whether the first or last token in the window.
This matters. Previously (2025), using a full 200K context window doubled the input price. Now it doesn't. This is why Opus 4.6 and Sonnet 4.6 are cheaper despite having larger context windows.
A single 1M-token context in Opus 4.6 costs $5 for input, regardless of whether tokens are all system prompt or all document text. No per-token degradation. This is a flat-rate model.
Compare to Sonnet 4.6 at 1M context for $3 input. The 200K context limit on Opus 4.5 is now artificial. Use Opus 4.6 when context exceeds 200K (teams get more context at same price).
Token Optimization Strategies
Prompt Caching
Caching reduces cached input costs by 90%. First request with a cached prompt block costs full price. Subsequent requests pay 10% of input cost for those cached tokens. Minimum cache block: 1,024 tokens.
Example: A 100K system prompt costs $0.50 on first request (Sonnet). On requests 2-100, those tokens cost $0.05 total ($0.50 × 10% = $0.05). 99 API calls with cached context cost $0.50 + (99 × $0.05) = $5.45 instead of $49.50. Savings: 89%.
Caching is most valuable when:
- The same large system prompt is used across many requests (100+ calls)
- Documents are analyzed repeatedly with same questions
- Conversation history exceeds 10K tokens
Implementation:
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": system_prompt,
"cache_control": {"type": "ephemeral"}
}
]
}
]
Cache tokens are tracked in response headers. Reuse the same cached blocks in subsequent requests.
Batch Processing
Batch API reduces costs by 50% with 24-hour turnaround. No per-request overhead, only cost per token processed. Best for non-realtime work: summarization, bulk analysis, classification, data labeling.
Example: Processing 10M tokens in batch costs 50% of standard rate.
- Opus input: $2.50 instead of $5.00
- Sonnet input: $1.50 instead of $3.00
Trade-off: Requests are queued and processed asynchronously. No real-time responses. Good for overnight jobs, not for chatbots.
When to use batch:
- Non-urgent summarization (overnight jobs)
- Bulk data labeling (training data generation)
- Monthly reports or analysis
- Cost is priority over latency
Output Token Prediction
Longer outputs cost more. A 10K completion token request on Opus costs $0.25. A 1K request costs $0.025. If output length is known, use shorter generation limits.
Example: Classification tasks output only 1-5 tokens. Set max_tokens to 20 and save 90% on output costs compared to unlimited generation.
Extended Thinking and Advanced Features
Extended thinking tokens (for reasoning steps the model keeps private) are billed as output tokens at standard rates. No surcharge. A 10K completion with 5K thinking tokens costs the same as a 10K output without thinking.
This makes extended thinking free in terms of pricing tiers. Pay for total tokens used, regardless of whether they're reasoning or output.
Extended thinking is valuable for multi-step problems where the reasoning process matters. Complex code generation, mathematical proofs, and logical analysis benefit from intermediate steps. The model can reason through a problem more carefully without inflating visible output tokens.
Cost Comparison: Model Selection
Same Task, Different Models
Analyzing 100K customer support tickets. Each analysis outputs about 500 tokens.
- Haiku: (100K × $0.001) + (50M tokens ÷ 1M × $5) = $100 + $250 = $350
- Sonnet: (100K × $0.003) + (50M ÷ 1M × $15) = $300 + $750 = $1,050
- Opus: (100K × $0.005) + (50M ÷ 1M × $25) = $500 + $1,250 = $1,750
Haiku is 3x cheaper. Opus is 5x more expensive.
For high-volume tasks where quality is acceptable, Haiku wins on pure cost. If analysis quality matters, Opus accuracy might prevent re-runs. Run a 100-ticket pilot on Haiku and Opus, measure quality, then decide.
Chatbot Serving 1M Requests Monthly
Average request: 2K input tokens (user message + context) + 200 output tokens (bot response).
- Haiku: (2M × $1) + (200K × $5) = $2,000 + $1,000 = $3,000/month
- Sonnet: (2M × $3) + (200K × $15) = $6,000 + $3,000 = $9,000/month
- Opus: (2M × $5) + (200K × $25) = $10,000 + $5,000 = $15,000/month
Switch from Sonnet to Haiku saves $6,000/month. If response quality is acceptable, the choice is obvious.
Monthly Cost Projections
Scenario 1: Batch Summarization Service
Processing 100GB of documents monthly. Average 1 token per 4 bytes (standard compression). 25B input tokens.
Haiku batch processing: 25B × $0.0005 (50% batch discount) = $12,500/month
Realistic scenario. High volume, non-realtime, cost-constrained.
Scenario 2: Real-Time Chat API
1M conversations monthly. 5K avg tokens per conversation (context + completion).
Sonnet: 5B tokens × $3 / 1M = $15,000/month (realtime serving required)
Competitive with GPT-4o ($2.50 input for first 128K tokens). Token counts differ by model, so actual comparison requires testing.
Scenario 3: Internal RAG System
10,000 queries/day. Each query passes 50K document context + 2K question.
52K input per request. 1K output per request.
With prompt caching (50K docs cached):
- Day 1: 50K × $3/M × 10,000 = $1,500 (cache write, charged at 10% rate = $150)
- Days 2-30: 50K cached (charges $0 after day 1) + 2K new × $3/M × 10,000 × 29 = $0 + $1,740 = $1,740
- Output tokens: 1K × $15/M × 10,000 × 30 = $4,500/month
- Total: ~$6,240/month
Without caching:
- Input: 52K × $3/M × 10,000 × 30 = $46,800/month
- Output: $4,500/month
- Total: $51,300/month
Caching saves $45,060/month (88% reduction) on this workload. This is why RAG systems must use caching.
Scenario 4: Large Document Processing (Enterprise)
Processing 500M tokens/month of documents.
- Standard on-demand: $2,500/month (average price $0.005/token)
- With caching (40% cache hit): $2,500 × 0.6 + $2,500 × 0.4 × 0.1 = $1,500 + $100 = $1,600/month
- With batch (50% of volume) + caching: $1,600 × 0.5 = $800/month
Combined optimization (caching + batch) reduces cost by 68%.
Prompt Caching Deep Dive
How Caching Works
- Mark prompt sections with cache_control: {"type": "ephemeral"}
- Minimum cache size: 1,024 tokens
- First request: pay full price for marked tokens
- Subsequent requests (within 5-minute window): pay 10% for cached tokens
Caching Economics
Cache hit cost: 10% of standard input rate
- Haiku cached: $0.10 per M tokens (vs $1.00 standard)
- Sonnet cached: $0.30 per M tokens (vs $3.00 standard)
- Opus cached: $0.50 per M tokens (vs $5.00 standard)
For a 100K cached block used 100 times:
- Sonnet standard: 100 × (100K × $3/M) = $30,000
- Sonnet cached: (100K × $3/M) + 99 × (100K × $0.30/M) = $300 + $2,970 = $3,270
- Savings: $26,730 (89%)
Best Practices for Caching
- Cache system prompts (reused across requests)
- Cache document context in RAG (same documents queried multiple times)
- Cache large instruction sets or knowledge bases
- Use in multi-turn conversations (context accumulates)
Batch API for Cost Reduction
Batch API allows asynchronous processing of requests with 24-hour turnaround and 50% cost discount.
Example batch job:
{
"requests": [
{
"custom_id": "doc-1",
"params": {
"model": "claude-opus-4.6",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Summarize: ."}]
}
}
]
}
Submit batch, wait 24 hours, retrieve results. Cost: 50% of on-demand pricing.
When to use batch:
- Nightly summarization of customer feedback
- Monthly data labeling for training
- Bulk analysis where latency is acceptable
- Cost is priority
Production Deployment Costs
Small Startup (10K requests/month)
Average: 1K input + 200 output tokens per request.
- Haiku: (10M tokens × $0.001 + 2M × $0.005) = $10 + $10 = $20/month
- Sonnet: (10M × $0.003 + 2M × $0.015) = $30 + $30 = $60/month
At this scale, cost is negligible. Quality matters more.
Mid-Size Company (1M requests/month)
Average: 2K input + 300 output tokens per request
- Haiku: (2B × $0.001 + 300M × $0.005) = $2,000 + $1,500 = $3,500/month
- Sonnet: (2B × $0.003 + 300M × $0.015) = $6,000 + $4,500 = $10,500/month
- Opus: (2B × $0.005 + 300M × $0.025) = $10,000 + $7,500 = $17,500/month
Choice between Haiku ($3.5K) and Sonnet ($10.5K) depending on quality requirements.
Large Production (100M requests/month, 50M daily)
Average: 3K input + 500 output tokens per request
- Haiku: (300B × $0.001 + 50B × $0.005) = $300,000 + $250,000 = $550,000/month
- Sonnet: (300B × $0.003 + 50B × $0.015) = $900,000 + $750,000 = $1,650,000/month
With caching (30% hit rate): multiply by ~0.85 → Haiku: $467K, Sonnet: $1.4M With batch (20% of volume): additional ~10-15% savings → Haiku: $450K, Sonnet: $1.3M
Use Case Recommendations
Use Haiku When
- High-volume, cost-constrained work
- Classification, tagging, simple Q&A
- Quality is acceptable at speed tier
- Internal automation
- Batch processing where errors caught downstream
Use Sonnet When
- Production chatbots and real-time applications
- Moderate reasoning required
- Quality matters but not mission-critical
- Most general-purpose API users default here
- Cost/speed/quality balance is optimal
Use Opus When
- Complex reasoning required
- Multi-step problem solving
- Code generation on hard problems
- Research and R&D
- Cost doesn't matter relative to quality
- Document analysis requiring deep understanding
FAQ
Can I use Opus in production at scale?
Yes, but it's expensive ($5 input / $25 output per M tokens). At 1B monthly tokens, cost approaches $50,000/month. Most teams use Sonnet for production and Opus for R&D.
Is prompt caching worth the complexity?
If processing the same 10K+ token prompt 10+ times monthly, yes. Setup is trivial (one flag in API call). Savings can exceed $1,000/month.
What's the difference between Opus 4.6 and Opus 4?
Same pricing ($5/$25). Opus 4.6 has 1M context (vs 200K). Same cost. Use 4.6 in all cases.
How do I know if Haiku is accurate enough?
Test on a sample of 100 examples. Measure error rate. If below 5% and acceptable for use case, use Haiku.
Is batch processing worth the latency?
If 24-hour turnaround acceptable, always use batch. 50% savings is significant. For non-realtime work (summarization, classification, training data labeling), batch is the right choice.
Should I migrate from Opus 4.1 immediately?
Yes. Opus 4.6 is cheaper, better, and has larger context. No downside to migration.
Related Resources
- Anthropic Models
- Claude Opus vs GPT-5 Comparison
- Claude Sonnet vs GPT-5 Comparison
- LLM Pricing Comparison
Sources
- Anthropic Claude API Pricing
- Anthropic API Documentation
- DeployBase LLM Pricing Tracker (Data as of March 21, 2026)