Claude 3.5 Sonnet is legacy as of March 2026 (replaced by Sonnet 4.6). Still matters if developers're running it. Direct Anthropic: $3/$15 per 1M tokens. AWS Bedrock/Vertex AI: way cheaper at volume.
Contents
- Claude 3.5 Sonnet Pricing: Overview
- Model Status and Deprecation Context
- Direct Anthropic API Pricing
- AWS Bedrock Pricing
- Google Vertex AI Pricing
- Provider Comparison and Optimization
- Cost Analysis Examples
- Migration Path to Sonnet 4.6
- Alternatives to Claude 3.5 Sonnet
- Advanced Pricing Optimizations
- Long-Term Cost Planning
- FAQ
- Related Resources
- Sources
Claude 3.5 Sonnet Pricing: Overview
Claude 3.5 Sonnet occupied the "best quality to cost" position in Anthropic's model family when released, positioned between the faster Haiku and more capable Opus models. The model has since been deprecated in favor of Claude Sonnet 4.6, which provides significantly better performance at comparable or slightly higher pricing.
As of March 2026, most new applications should use Claude Sonnet 4.6 ($3/$15 per 1M tokens) rather than 3.5 Sonnet. However, legacy systems, existing deployments, and cost-optimized applications still actively use 3.5 Sonnet through various providers. This guide covers current pricing across all providers and migration strategies for teams running legacy models.
The pricing structure reflects Anthropic's broader market positioning: direct API access provides transparent pricing with no volume discounts, while production partnerships through AWS and Google provide volume-based pricing and integration with existing infrastructure.
Model Status and Deprecation Context
Claude 3.5 Sonnet was released in June 2024 as a mid-tier model balancing quality and latency. It became widely adopted for customer-facing applications, internal tools, and development workloads. In March 2025, Anthropic released Claude Sonnet 4.6, which supersedes 3.5 Sonnet in capability while maintaining compatible pricing.
Timeline:
- June 2024: Claude 3.5 Sonnet released
- March 2025: Claude Sonnet 4.6 released (supercedes 3.5 Sonnet)
- Current (March 2026): 3.5 Sonnet still available but deprecated
Deprecation status:
- No announced sunset date
- Anthropic maintains API access indefinitely
- New applications should use Sonnet 4.6
- Legacy applications using 3.5 can continue without change
The continued availability of deprecated models reflects Anthropic's commitment to backward compatibility. Teams can migrate to Sonnet 4.6 on their timeline without forced migration pressure.
Direct Anthropic API Pricing
Anthropic's direct API provides the baseline pricing all other providers reference or mark up.
Claude 3.5 Sonnet Direct Pricing
Input tokens: $3.00 per 1M tokens Output tokens: $15.00 per 1M tokens
This pricing applies equally to all usage patterns:
- Text generation
- Classification and analysis
- Code generation
- Complex reasoning
- RAG augmented queries
No volume discounts available on direct API.
Calculation Example
A customer support application processing 100,000 conversations monthly with average 500 input tokens and 300 output tokens per conversation:
Monthly cost:
- Input cost: 100,000 * 500 * $3/1M = $150
- Output cost: 100,000 * 300 * $15/1M = $450
- Total: $600
Annual cost: $7,200
For reference, upgrading to Claude Sonnet 4.6 costs identical $600/month with better performance.
Usage Patterns Impact on Cost
Cost varies based on input/output ratio (sometimes called prompt/completion ratio):
High input ratio (research analysis, document processing):
- 2,000 input tokens + 200 output tokens
- Input: 2,000 * $3/1M = $0.006
- Output: 200 * $15/1M = $0.003
- Cost per query: $0.009
Balanced ratio (general conversation):
- 500 input + 300 output
- Input: 500 * $3/1M = $0.0015
- Output: 300 * $15/1M = $0.0045
- Cost per query: $0.006
High output ratio (content generation):
- 200 input + 2,000 output
- Input: 200 * $3/1M = $0.0006
- Output: 2,000 * $15/1M = $0.03
- Cost per query: $0.0306
High-output applications cost 3.4x more per query than high-input applications at Anthropic's standard pricing.
AWS Bedrock Pricing
AWS Bedrock provides Claude models through Amazon's managed service, integrating with AWS infrastructure and billing.
Claude 3.5 Sonnet on Bedrock
On-demand pricing:
- Input: $3.00 per 1M tokens (same as direct)
- Output: $15.00 per 1M tokens (same as direct)
On-demand convenience fee: 0% markup (identical to direct pricing)
This parity with direct pricing makes AWS Bedrock attractive for teams already using AWS infrastructure, despite the lack of cost savings.
Bedrock Batch Processing
AWS Bedrock offers batch API for non-time-sensitive workloads:
Batch pricing:
- Input: $0.60 per 1M tokens (80% discount)
- Output: $3.00 per 1M tokens (80% discount)
Requirements:
- Minimum 10,000 tokens per batch
- 24-hour turnaround time
- Same models and quality as on-demand
When to use batch:
- Analyzing 100K documents for insights (suitable for 24hr delay)
- Processing historical data for training
- Generating embeddings at scale
- Content generation for articles/summaries
Cost comparison (100,000 queries at 500 input + 300 output):
On-demand:
- Input: 100,000 * 500 * $3/1M = $150
- Output: 100,000 * 300 * $15/1M = $450
- Total: $600
Batch API:
- Input: 100,000 * 500 * $0.60/1M = $30
- Output: 100,000 * 300 * $3/1M = $90
- Total: $120
Savings: $480/month (80% reduction)
Bedrock Provisioned Throughput
For predictable high-volume workloads, Bedrock offers provisioned throughput:
Provisioned model units (PMUs):
- 100 PMUs: $1.34/hour = $32.16/month = $385/year for 730 hours/month
- Minimum commitment: 1 hour
Each PMU provides consistent throughput capacity independent of number of requests.
When provisioned throughput makes sense:
- Consistent >50K requests/month
- Predictable traffic patterns
- Cost >$1,000/month on on-demand pricing
Example: Processing 1M queries/month at average 300 tokens:
- On-demand cost: $300/month
- Provisioned cost: $38.40/month (requires minimum commitment)
- Provisioned becomes cheaper at >300K queries/month
Google Vertex AI Pricing
Google's Vertex AI provides Claude models integrated with GCP infrastructure.
Claude 3.5 Sonnet on Vertex AI
On-demand pricing:
- Input: $3.00 per 1M tokens (same as direct)
- Output: $15.00 per 1M tokens (same as direct)
GCP billing integration:
- Bundles with GCP services (similar to Bedrock)
- Applicable to GCP free credits
- Includes GCP's cost analysis and budget alerts
Vertex AI Volume Discounts
Vertex AI provides volume discounts for monthly spend:
Discount tiers:
- $100-500/month: 0% discount (on-demand pricing)
- $500-1,000/month: 5% discount
- $1,000-5,000/month: 10% discount
- $5,000-10,000/month: 15% discount
- $10,000+/month: 20% discount
Cost example with discounts:
Processing 5M queries/month at 500 input + 300 output tokens:
- Raw cost: $3,000/month (input) + $9,000/month (output) = $12,000
- Discount tier: 15% (falls in $5,000-10,000 bucket... actually $12,000/month = 20% tier)
- Actual cost: $12,000 * 0.80 = $9,600/month
- Savings: $2,400/month (20% reduction)
Vertex AI with Commitment Contracts
Vertex AI offers annual commitment contracts with greater discounts:
1-year commitment:
- 25% discount on list prices
- $3,000 minimum monthly commitment
- $36,000/year minimum spending
3-year commitment:
- 35% discount on list prices
- $5,000 minimum monthly commitment
- $60,000/year minimum spending
Effective pricing with 1-year commitment:
- Input: $3.00 * 0.75 = $2.25 per 1M tokens
- Output: $15.00 * 0.75 = $11.25 per 1M tokens
Provider Comparison and Optimization
Cost Comparison Summary
| Provider | Input | Output | Best For |
|---|---|---|---|
| Anthropic Direct | $3.00 | $15.00 | Simplicity, no infrastructure |
| AWS Bedrock (on-demand) | $3.00 | $15.00 | AWS ecosystem integration |
| AWS Bedrock (batch) | $0.60 | $3.00 | Non-time-sensitive workloads |
| Google Vertex (on-demand) | $3.00 | $15.00 | GCP infrastructure |
| Google Vertex (volume) | $2.55 | $12.75 | High-volume workloads |
| Google Vertex (1yr commitment) | $2.25 | $11.25 | Committed monthly spend >$3K |
| Google Vertex (3yr commitment) | $1.95 | $9.75 | Committed monthly spend >$5K |
Cost Optimization Strategies
Strategy 1: Use batch processing for non-urgent workloads
- AWS Bedrock batch provides 80% savings
- Suitable for document analysis, content generation, training data
- Trade-off: 24-hour turnaround
Strategy 2: Volume commitments for high-volume workloads
- Google Vertex 1-year commitment saves 25% at >$36K/year spending
- AWS provisioned throughput saves money at >$300/month
- Best for stable, predictable traffic
Strategy 3: Route traffic by urgency
- Real-time customer interactions: on-demand API
- Background jobs: batch processing
- Combine approaches for 20-30% average savings
Strategy 4: Upgrade to Claude Sonnet 4.6
- Same pricing as 3.5 Sonnet
- 15-20% better performance
- No migration cost (same pricing)
- Recommended for all new applications
Cost Analysis Examples
Scenario 1: Startup Customer Support Bot
Requirements: 50,000 customer conversations/month, sub-second latency required
Technology choice: Claude 3.5 Sonnet (2,000 tokens input + 200 tokens output average)
Monthly costs:
Anthropic Direct:
- Input: 50,000 * 2,000 * $3/1M = $300
- Output: 50,000 * 200 * $15/1M = $150
- Total: $450
AWS Bedrock:
- Same pricing as direct: $450
Google Vertex (no volume discount yet):
- Same pricing as direct: $450
Recommendation: Use Anthropic Direct (simplest billing, no AWS/GCP overhead)
Scenario 2: Content Generation Platform
Requirements: 100,000 blog posts/month, 24-hour turnaround acceptable
Technology choice: Claude 3.5 Sonnet with batch processing (500 input + 2,000 output)
Monthly costs:
AWS Bedrock Batch:
- Input: 100,000 * 500 * $0.60/1M = $30
- Output: 100,000 * 2,000 * $3/1M = $600
- Total: $630
Anthropic Direct (on-demand):
- Input: 100,000 * 500 * $3/1M = $150
- Output: 100,000 * 2,000 * $15/1M = $3,000
- Total: $3,150
Savings with batch: $2,520/month (80% reduction)
Recommendation: Use AWS Bedrock batch processing exclusively
Scenario 3: Production Analytics Platform
Requirements: 2M API calls/month, requires integration with existing GCP infrastructure
Technology choice: Claude 3.5 Sonnet with 1-year Vertex AI commitment (300 input + 500 output average)
Monthly costs:
Monthly spend without commitment:
- Input: 2,000,000 * 300 * $3/1M = $1,800
- Output: 2,000,000 * 500 * $15/1M = $15,000
- Subtotal: $16,800
With volume discount (15% from hitting $10K+ threshold):
- $16,800 * 0.85 = $14,280
With 1-year commitment (additional 25% discount):
- $16,800 * 0.75 = $12,600
Comparison:
- On-demand: $16,800/month
- With volume: $14,280/month (15% savings)
- With 1-year commitment: $12,600/month (25% savings)
- Annual savings with commitment: $50,400
Recommendation: Sign Google Vertex AI 1-year commitment; integrate with existing GCP infrastructure
Migration Path to Sonnet 4.6
Most teams using Claude 3.5 Sonnet should migrate to Claude Sonnet 4.6, which offers identical pricing with superior performance.
Performance Improvements in Sonnet 4.6
- 15-20% improvement on reasoning tasks
- Better coding capability
- Improved instruction following
- Slightly faster response times
- Identical pricing ($3/$15)
Migration Strategy
Step 1: Update model identifier
- Change
claude-3-5-sonnet-20241022toclaude-sonnet-4-20250514(or current version) - No other code changes required
Step 2: Run side-by-side tests
- Route 10% of traffic to Sonnet 4.6
- Compare output quality and latency
- Monitor costs (should be identical)
Step 3: Gradual rollout
- Increase Sonnet 4.6 traffic to 50%, then 100%
- Monitor error rates and user feedback
- Maintain 3.5 Sonnet routing for edge cases
Step 4: Sunset legacy model
- Once 100% traffic on Sonnet 4.6 for 1 month
- Remove 3.5 Sonnet from codebase
- Document migration in system architecture
Timing Recommendation
Migrate within 3 months of reading this guide. There's zero cost to migration and the performance improvements compound over time.
Alternatives to Claude 3.5 Sonnet
Understanding competitive options informs better tool selection.
GPT-4 Mini vs Claude 3.5 Sonnet
GPT-4 Mini pricing: $0.15/$0.60 (16x cheaper for input, 25x for output)
Performance comparison:
- Reasoning tasks: Claude 3.5 Sonnet 15-20% better
- Code generation: Roughly equivalent
- Creative writing: Claude slightly better
- Mathematical problems: Claude 15-20% better
When GPT-4 Mini makes sense:
- Budget-constrained applications (<$100/month)
- Classification and simple tasks
- Non-critical workloads tolerating lower accuracy
Cost example (100K queries, 500 input + 300 output):
- Claude 3.5 Sonnet: $600
- GPT-4 Mini: $25.50
- Savings: $574.50 (96% reduction)
Gemini 2.5 Pro vs Claude 3.5 Sonnet
Gemini 2.5 Pro pricing: $1.25/$10 ($362.50/month for same workload)
Advantages:
- 1M context window (vs 200K for Sonnet)
- Real-time information access
- Multimodal (images, audio, video)
Disadvantages:
- Slightly lower quality on reasoning
- Less stable for production systems
- Fewer integrations
When Gemini makes sense:
- Document analysis requiring huge context
- Real-time information critical
- Multimodal input required
Open-Source Models vs Claude
Popular open-source options:
- Llama 3.1-70B: Self-hosted costs $2-3/hour on GPU
- Mistral-8x7B: Lower quality, costs $0.5-1/hour
- Falcon-40B: Good for specific tasks, costs $1-2/hour
When open-source makes sense:
- High-volume inference (>1M queries/month)
- Data privacy critical (no external API calls)
- Proprietary domain requires fine-tuning
- Cost optimization above quality requirements
Cost comparison (1M monthly queries, 500 input + 300 output):
- Claude Sonnet: $6,000
- Open-source (8h/day utilization on $2/h GPU): $480
- Break-even: ~350K queries/month
Advanced Pricing Optimizations
Beyond basic selection, several advanced strategies reduce Claude costs.
Prompt Caching
Anthropic offers prompt caching for repeated prefixes:
- First request with cache prefix: Full token count
- Subsequent requests reusing prefix: 10% of prefix tokens
- Cache invalidation: Automatic after 5 minutes
Applicable scenarios:
- Long system prompts (financial regulations, coding guidelines)
- RAG with same knowledge base
- Batch processing similar documents
Savings calculation: 10K queries daily with 1000-token shared system prompt:
- Without caching: 10K * 1000 = 10M system prompt tokens daily
- With caching: 1,000 system prompt tokens (cached) + 10K * 0 = negligible
- Savings: 99% on system tokens = ~$30/month
Context Window Optimization
Claude's 200K context window enables processing long documents without multiple API calls:
Document analysis (traditional approach):
- 100K token document
- Split into 5 chunks (40K each within model limits)
- 5 API calls + 5 round trips
- Cost: 5 * (40K input + 2K output) = 210K tokens
Single request approach:
- 100K token document
- 1 API call with full context
- Cost: 1 * (100K input + 2K output) = 102K tokens
- Savings: 51%
Asynchronous Batch Processing
For non-urgent processing, batch APIs offer 80% discounts:
Use cases:
- Daily document analysis
- Email summarization overnight
- Training data generation
- Log analysis
Cost calculation (1M documents, 100 tokens each):
- On-demand: 1M * 100 * $3/1M = $300
- Batch: 1M * 100 * $0.60/1M = $60
- Savings: $240 per batch
Run daily batch: $60 * 30 days = $1,800/month savings
Multi-Model Strategy
Use cheaper models for simple tasks, expensive models only when needed:
Query routing logic:
- Classification tasks: GPT-4 Mini ($0.15/$0.60)
- Complex reasoning: Claude 3.5 Sonnet ($3/$15)
- Fallback if Mini uncertain: Escalate to Sonnet
Cost distribution (100K queries):
- 70% simple (classifiable): 70K * GPT-4 Mini = $17.85
- 30% complex: 30K * Claude Sonnet = $180
- Hybrid total: $197.85
- vs all Claude: $600
- Savings: 67%
Long-Term Cost Planning
teams should develop cost projections for 3-5 year horizons.
Year 1: Launch and Growth
Month 1-3: MVP phase
- Usage: 10K queries/month
- Cost: $60/month Claude + $20 infrastructure = $80
Month 4-6: Product-market fit
- Usage: 100K queries/month
- Cost: $600/month Claude + $100 infrastructure = $700
Month 7-12: Growth phase
- Usage: 500K queries/month
- Cost: $3,000/month Claude + $500 infrastructure = $3,500
- Annual total: ~$8,000
Year 2: Scale
Usage: 5M queries/month Cost:
- Base Claude: $30,000/month
- Multi-model optimization (30% savings): $21,000/month
- Infrastructure scaling: $2,000/month
- Total: $23,000/month = $276,000/year
Year 3: Maturity
Usage: 20M queries/month Cost:
- Base Claude: $120,000/month
- Volume discounts (if negotiated): -$20,000/month
- Multi-model + caching (40% optimization): $48,000/month
- Infrastructure: $5,000/month
- Total: $133,000/month = $1.6M/year
Mitigation strategies at scale:
- Negotiate volume discounts (typically 15-20% at $100K+/month)
- Build proprietary fine-tuned model for core tasks
- Implement aggressive caching and optimization
- Consider open-source model fallback
FAQ
Is Claude 3.5 Sonnet deprecated? Yes, as of March 2026. Claude Sonnet 4.6 has superseded it with identical pricing and better performance. No sunset date announced, but Anthropic recommends new applications use Sonnet 4.6.
What's the difference between 3.5 Sonnet and Sonnet 4.6? Sonnet 4.6 is 15-20% better at reasoning and coding tasks with identical pricing. For most applications, upgrade is worthwhile. For cost-optimized systems, 3.5 Sonnet remains viable.
Which provider offers the best 3.5 Sonnet pricing? All providers offer identical on-demand pricing ($3/$15). AWS Bedrock batch (80% discount) is cheapest for non-urgent workloads. Google Vertex 1-year commitments are cheapest for predictable high-volume usage.
Can I use batch processing for all workloads? No, batch processing requires 24-hour turnaround. Use batch for background jobs, document analysis, and non-real-time applications. On-demand required for customer-facing, real-time, sub-second latency applications.
Should I commit to Vertex AI contracts? Only if spending >$3,000/month with stable workloads. Contracts lock you in for 1-3 years. If your usage is growing rapidly, avoid multi-year commitments until growth stabilizes.
What about open-source models instead of 3.5 Sonnet? Open-source Llama-3.1-70B achieves 70-80% of Sonnet 3.5 performance, costs $2-3/hour to run on cloud GPUs. Self-hosting costs money; API access more economical for <1M monthly queries.
Can I use 3.5 Sonnet for production? Yes, it remains stable and reliable. However, new deployments should use Sonnet 4.6 (same price, better performance). For legacy systems, 3.5 Sonnet works indefinitely.
Related Resources
For broader context on Anthropic's pricing and model ecosystem:
- Explore Anthropic's complete model pricing
- Review Anthropic API pricing details and optimization strategies
- Compare Claude 4 pricing versus other models
Sources
Pricing data from Anthropic official API documentation as of March 2026. AWS Bedrock pricing from Amazon Web Services pricing page. Google Vertex AI pricing from Google Cloud pricing calculator. Performance comparisons from MLPerf benchmarks and Anthropic's official model benchmarks. Deprecation timeline based on Anthropic blog announcements and API documentation updates.