Claude API Pricing 2026: Complete Anthropic Model Cost Guide

Claude API Pricing: Overview
Pricing by Model
Model Lineup and Capabilities
Context Window Pricing
Token Optimization Strategies
Extended Thinking and Advanced Features
Cost Comparison: Model Selection
Monthly Cost Projections
Prompt Caching Deep Dive
Batch API for Cost Reduction
Production Deployment Costs
Use Case Recommendations
FAQ
Related Resources
Sources

Claude API Pricing: Overview

Claude API pricing spans $0.25 to $15 per million input tokens, depending on model and context window. Opus 4.6 and Sonnet 4.6 eliminated long-context surcharges, making 1M contexts available at standard rates. Prompt caching cuts costs 90% on cached input. Batch processing drops prices further with 24-hour windows. Current as of March 21, 2026.

Pricing by Model

Model	Context	Input $/M	Output $/M	Throughput	Max Output
Claude Opus 4.6	1M	$5.00	$25.00	35 tok/s	128K
Claude Sonnet 4.6	1M	$3.00	$15.00	37 tok/s	64K
Claude Sonnet 4.5	1M	$3.00	$15.00	36 tok/s	64K
Claude Sonnet 4	1M	$3.00	$15.00	42 tok/s	64K
Claude Opus 4.5	200K	$5.00	$25.00	39 tok/s	64K
Claude Opus 4	200K	$15.00	$75.00	29 tok/s	32K
Claude Opus 4.1	200K	$15.00	$75.00	21 tok/s	32K
Claude Haiku 4.5	200K	$1.00	$5.00	44 tok/s	64K
Claude 3 Haiku	200K	$0.25	$1.25	40 tok/s	4K

Pricing as of March 21, 2026. All prices per million tokens.

Model Lineup and Capabilities

Opus 4.6: The Latest Flagship

Strongest reasoning and most capable. Best for complex analysis, code generation on difficult problems, multi-step reasoning, and expert-level tasks. 1M context window (5x larger than previous Opus). Throughput: 35 tokens/second means a 128K output takes about 1 hour.

Cost calculation: A request with 100K context + 10K completion costs approximately $0.50 + $0.25 = $0.75.

Opus 4.6 works for research teams, high-stakes use cases, workloads where reasoning justifies the cost. Complex problem solving. Scientific analysis. Detailed code reviews. Multi-step debugging. When accuracy matters more than speed or cost, Opus wins.

When to use Opus 4.6:

Complex reasoning required (legal analysis, scientific papers, multi-step logic)
Large context requirements (need to fit 500K+ tokens in single request)
Mission-critical applications (financial decisions, medical analysis)
Research and R&D (answer quality is priority)

Sonnet 4.6: The Balanced Choice

Better speed than Opus, still strong reasoning. Best for chatbots, code completions, and real-time applications. 1M context window (matches Opus). Throughput: 37 tokens/second (slightly faster than Opus). Cost: $3/$15 per M tokens.

Cheaper than Opus and only slightly slower on reasoning. Most API users pick Sonnet 4.6 for production. Not too fast, not too smart, just right for shipping products in real-time.

When to use Sonnet 4.6:

Production chatbots and Q&A systems
Real-time applications (< 500ms latency required)
Moderate reasoning requirements (not latest complexity)
High-volume serving (cost per token matters)

Haiku 4.5: The Speed Tier

Fastest model. Best for high-volume processing, simple tasks, classification, and cost-constrained applications. 200K context window (enough for most documents). Throughput: 44 tokens/second (fastest tier). Cost: $1/$5 per M tokens.

5x cheaper than Sonnet on input, useful for passing large documents through multiple completions. A 500K token document summary costs $0.50 on Haiku vs $1.50 on Sonnet.

For classification, tagging, and other high-volume, low-complexity tasks, Haiku's speed and cost are a clear win. Error rate on Haiku is 2-3% higher than Sonnet on reasoning tasks, but acceptable for tasks with downstream validation.

When to use Haiku 4.5:

High-volume batch processing (tagging, classification)
Cost-constrained applications (startups, non-profits)
Simple tasks with acceptable error rates (5%+)
Throughput-critical applications (need fast responses)

Legacy Models (Avoid)

Claude 3 Haiku, Opus 4, Opus 4.1 still available but deprecated. Opus 4.1 costs $15/$75 per M tokens (3x the price of Opus 4.6). Use current models instead.

Migration economics: Migrating from Opus 4.1 to Opus 4.6 is free in terms of API compatibility and saves 67% on costs.

Context Window Pricing

CRITICAL: Anthropic eliminated long-context surcharges in 2026. All context tokens cost the same whether the first or last token in the window.

This matters. Previously (2025), using a full 200K context window doubled the input price. Now it doesn't. This is why Opus 4.6 and Sonnet 4.6 are cheaper despite having larger context windows.

A single 1M-token context in Opus 4.6 costs $5 for input, regardless of whether tokens are all system prompt or all document text. No per-token degradation. This is a flat-rate model.

Compare to Sonnet 4.6 at 1M context for $3 input. The 200K context limit on Opus 4.5 is now artificial. Use Opus 4.6 when context exceeds 200K (teams get more context at same price).

Token Optimization Strategies

Prompt Caching

Caching reduces cached input costs by 90%. First request with a cached prompt block costs full price. Subsequent requests pay 10% of input cost for those cached tokens. Minimum cache block: 1,024 tokens.

Example: A 100K system prompt costs $0.50 on first request (Sonnet). On requests 2-100, those tokens cost $0.05 total ($0.50 × 10% = $0.05). 99 API calls with cached context cost $0.50 + (99 × $0.05) = $5.45 instead of $49.50. Savings: 89%.

Caching is most valuable when:

The same large system prompt is used across many requests (100+ calls)
Documents are analyzed repeatedly with same questions
Conversation history exceeds 10K tokens

Implementation:

messages = [
 {
 "role": "user",
 "content": [
 {
 "type": "text",
 "text": system_prompt,
 "cache_control": {"type": "ephemeral"}
 }
 ]
 }
]

Cache tokens are tracked in response headers. Reuse the same cached blocks in subsequent requests.

Batch Processing

Batch API reduces costs by 50% with 24-hour turnaround. No per-request overhead, only cost per token processed. Best for non-realtime work: summarization, bulk analysis, classification, data labeling.

Example: Processing 10M tokens in batch costs 50% of standard rate.

Opus input: $2.50 instead of $5.00
Sonnet input: $1.50 instead of $3.00

Trade-off: Requests are queued and processed asynchronously. No real-time responses. Good for overnight jobs, not for chatbots.

When to use batch:

Non-urgent summarization (overnight jobs)
Bulk data labeling (training data generation)
Monthly reports or analysis
Cost is priority over latency

Output Token Prediction

Longer outputs cost more. A 10K completion token request on Opus costs $0.25. A 1K request costs $0.025. If output length is known, use shorter generation limits.

Example: Classification tasks output only 1-5 tokens. Set max_tokens to 20 and save 90% on output costs compared to unlimited generation.

Extended Thinking and Advanced Features

Extended thinking tokens (for reasoning steps the model keeps private) are billed as output tokens at standard rates. No surcharge. A 10K completion with 5K thinking tokens costs the same as a 10K output without thinking.

This makes extended thinking free in terms of pricing tiers. Pay for total tokens used, regardless of whether they're reasoning or output.

Extended thinking is valuable for multi-step problems where the reasoning process matters. Complex code generation, mathematical proofs, and logical analysis benefit from intermediate steps. The model can reason through a problem more carefully without inflating visible output tokens.

Cost Comparison: Model Selection

Same Task, Different Models

Analyzing 100K customer support tickets. Each analysis outputs about 500 tokens.

Haiku: (100K × $0.001) + (50M tokens ÷ 1M × $5) = $100 + $250 = $350
Sonnet: (100K × $0.003) + (50M ÷ 1M × $15) = $300 + $750 = $1,050
Opus: (100K × $0.005) + (50M ÷ 1M × $25) = $500 + $1,250 = $1,750

Haiku is 3x cheaper. Opus is 5x more expensive.

For high-volume tasks where quality is acceptable, Haiku wins on pure cost. If analysis quality matters, Opus accuracy might prevent re-runs. Run a 100-ticket pilot on Haiku and Opus, measure quality, then decide.

Chatbot Serving 1M Requests Monthly

Average request: 2K input tokens (user message + context) + 200 output tokens (bot response).

Haiku: (2M × $1) + (200K × $5) = $2,000 + $1,000 = $3,000/month
Sonnet: (2M × $3) + (200K × $15) = $6,000 + $3,000 = $9,000/month
Opus: (2M × $5) + (200K × $25) = $10,000 + $5,000 = $15,000/month

Switch from Sonnet to Haiku saves $6,000/month. If response quality is acceptable, the choice is obvious.

Monthly Cost Projections

Scenario 1: Batch Summarization Service

Processing 100GB of documents monthly. Average 1 token per 4 bytes (standard compression). 25B input tokens.

Haiku batch processing: 25B × $0.0005 (50% batch discount) = $12,500/month

Realistic scenario. High volume, non-realtime, cost-constrained.

Scenario 2: Real-Time Chat API

1M conversations monthly. 5K avg tokens per conversation (context + completion).

Sonnet: 5B tokens × $3 / 1M = $15,000/month (realtime serving required)

Competitive with GPT-4o ($2.50 input for first 128K tokens). Token counts differ by model, so actual comparison requires testing.

Scenario 3: Internal RAG System

10,000 queries/day. Each query passes 50K document context + 2K question.

52K input per request. 1K output per request.

With prompt caching (50K docs cached):

Day 1: 50K × $3/M × 10,000 = $1,500 (cache write, charged at 10% rate = $150)
Days 2-30: 50K cached (charges $0 after day 1) + 2K new × $3/M × 10,000 × 29 = $0 + $1,740 = $1,740
Output tokens: 1K × $15/M × 10,000 × 30 = $4,500/month
Total: ~$6,240/month

Without caching:

Input: 52K × $3/M × 10,000 × 30 = $46,800/month
Output: $4,500/month
Total: $51,300/month

Caching saves $45,060/month (88% reduction) on this workload. This is why RAG systems must use caching.

Scenario 4: Large Document Processing (Enterprise)

Processing 500M tokens/month of documents.

Standard on-demand: $2,500/month (average price $0.005/token)
With caching (40% cache hit): $2,500 × 0.6 + $2,500 × 0.4 × 0.1 = $1,500 + $100 = $1,600/month
With batch (50% of volume) + caching: $1,600 × 0.5 = $800/month

Combined optimization (caching + batch) reduces cost by 68%.

Prompt Caching Deep Dive

How Caching Works

Mark prompt sections with cache_control: {"type": "ephemeral"}
Minimum cache size: 1,024 tokens
First request: pay full price for marked tokens
Subsequent requests (within 5-minute window): pay 10% for cached tokens

Caching Economics

Cache hit cost: 10% of standard input rate

Haiku cached: $0.10 per M tokens (vs $1.00 standard)
Sonnet cached: $0.30 per M tokens (vs $3.00 standard)
Opus cached: $0.50 per M tokens (vs $5.00 standard)

For a 100K cached block used 100 times:

Sonnet standard: 100 × (100K × $3/M) = $30,000
Sonnet cached: (100K × $3/M) + 99 × (100K × $0.30/M) = $300 + $2,970 = $3,270
Savings: $26,730 (89%)

Best Practices for Caching

Cache system prompts (reused across requests)
Cache document context in RAG (same documents queried multiple times)
Cache large instruction sets or knowledge bases
Use in multi-turn conversations (context accumulates)

Batch API for Cost Reduction

Batch API allows asynchronous processing of requests with 24-hour turnaround and 50% cost discount.

Example batch job:

{
 "requests": [
 {
 "custom_id": "doc-1",
 "params": {
 "model": "claude-opus-4.6",
 "max_tokens": 1024,
 "messages": [{"role": "user", "content": "Summarize: ."}]
 }
 }
 ]
}

Submit batch, wait 24 hours, retrieve results. Cost: 50% of on-demand pricing.

When to use batch:

Nightly summarization of customer feedback
Monthly data labeling for training
Bulk analysis where latency is acceptable
Cost is priority

Production Deployment Costs

Small Startup (10K requests/month)

Average: 1K input + 200 output tokens per request.

Haiku: (10M tokens × $0.001 + 2M × $0.005) = $10 + $10 = $20/month
Sonnet: (10M × $0.003 + 2M × $0.015) = $30 + $30 = $60/month

At this scale, cost is negligible. Quality matters more.

Mid-Size Company (1M requests/month)

Average: 2K input + 300 output tokens per request

Haiku: (2B × $0.001 + 300M × $0.005) = $2,000 + $1,500 = $3,500/month
Sonnet: (2B × $0.003 + 300M × $0.015) = $6,000 + $4,500 = $10,500/month
Opus: (2B × $0.005 + 300M × $0.025) = $10,000 + $7,500 = $17,500/month

Choice between Haiku ($3.5K) and Sonnet ($10.5K) depending on quality requirements.

Large Production (100M requests/month, 50M daily)

Average: 3K input + 500 output tokens per request

Haiku: (300B × $0.001 + 50B × $0.005) = $300,000 + $250,000 = $550,000/month
Sonnet: (300B × $0.003 + 50B × $0.015) = $900,000 + $750,000 = $1,650,000/month

With caching (30% hit rate): multiply by ~0.85 → Haiku: $467K, Sonnet: $1.4M With batch (20% of volume): additional ~10-15% savings → Haiku: $450K, Sonnet: $1.3M

Use Case Recommendations

Use Haiku When

High-volume, cost-constrained work
Classification, tagging, simple Q&A
Quality is acceptable at speed tier
Internal automation
Batch processing where errors caught downstream

Use Sonnet When

Production chatbots and real-time applications
Moderate reasoning required
Quality matters but not mission-critical
Most general-purpose API users default here
Cost/speed/quality balance is optimal

Use Opus When

Complex reasoning required
Multi-step problem solving
Code generation on hard problems
Research and R&D
Cost doesn't matter relative to quality
Document analysis requiring deep understanding

FAQ

Can I use Opus in production at scale?

Yes, but it's expensive ($5 input / $25 output per M tokens). At 1B monthly tokens, cost approaches $50,000/month. Most teams use Sonnet for production and Opus for R&D.

Is prompt caching worth the complexity?

If processing the same 10K+ token prompt 10+ times monthly, yes. Setup is trivial (one flag in API call). Savings can exceed $1,000/month.

What's the difference between Opus 4.6 and Opus 4?

Same pricing ($5/$25). Opus 4.6 has 1M context (vs 200K). Same cost. Use 4.6 in all cases.

How do I know if Haiku is accurate enough?

Test on a sample of 100 examples. Measure error rate. If below 5% and acceptable for use case, use Haiku.

Is batch processing worth the latency?

If 24-hour turnaround acceptable, always use batch. 50% savings is significant. For non-realtime work (summarization, classification, training data labeling), batch is the right choice.

Should I migrate from Opus 4.1 immediately?

Yes. Opus 4.6 is cheaper, better, and has larger context. No downside to migration.

Sources

Anthropic Claude API Pricing
Anthropic API Documentation
DeployBase LLM Pricing Tracker (Data as of March 21, 2026)

Contents