Claude 3.5 Sonnet Pricing: Compare Costs Across All API Providers

Claude 3.5 Sonnet is legacy as of March 2026 (replaced by Sonnet 4.6). Still matters if you're running it. Direct Anthropic: $3/$15 per 1M tokens. AWS Bedrock/Vertex AI: way cheaper at volume.

Claude 3.5 Sonnet Pricing: Overview
Model Status and Deprecation Context
Direct Anthropic API Pricing
AWS Bedrock Pricing
Google Vertex AI Pricing
Provider Comparison and Optimization
Cost Analysis Examples
Migration Path to Sonnet 4.6
Alternatives to Claude 3.5 Sonnet
Advanced Pricing Optimizations
Long-Term Cost Planning
FAQ
Related Resources
Sources

Claude 3.5 Sonnet Pricing: Overview

Claude 3.5 Sonnet occupied the "best quality to cost" position in Anthropic's model family when released, positioned between the faster Haiku and more capable Opus models. The model has since been deprecated in favor of Claude Sonnet 4.6, which provides significantly better performance at comparable or slightly higher pricing.

As of March 2026, most new applications should use Claude Sonnet 4.6 ($3/$15 per 1M tokens) rather than 3.5 Sonnet. However, legacy systems, existing deployments, and cost-optimized applications still actively use 3.5 Sonnet through various providers. This guide covers current pricing across all providers and migration strategies for teams running legacy models.

The pricing structure reflects Anthropic's broader market positioning: direct API access provides transparent pricing with no volume discounts, while production partnerships through AWS and Google provide volume-based pricing and integration with existing infrastructure.

Model Status and Deprecation Context

Claude 3.5 Sonnet was released in June 2024 as a mid-tier model balancing quality and latency. It became widely adopted for customer-facing applications, internal tools, and development workloads. In March 2025, Anthropic released Claude Sonnet 4.6, which supersedes 3.5 Sonnet in capability while maintaining compatible pricing.

Timeline:

June 2024: Claude 3.5 Sonnet released
March 2025: Claude Sonnet 4.6 released (supercedes 3.5 Sonnet)
Current (March 2026): 3.5 Sonnet still available but deprecated

Deprecation status:

No announced sunset date
Anthropic maintains API access indefinitely
New applications should use Sonnet 4.6
Legacy applications using 3.5 can continue without change

The continued availability of deprecated models reflects Anthropic's commitment to backward compatibility. Teams can migrate to Sonnet 4.6 on their timeline without forced migration pressure.

Direct Anthropic API Pricing

Anthropic's direct API provides the baseline pricing all other providers reference or mark up.

Claude 3.5 Sonnet Direct Pricing

Input tokens: $3.00 per 1M tokens Output tokens: $15.00 per 1M tokens

This pricing applies equally to all usage patterns:

Text generation
Classification and analysis
Code generation
Complex reasoning
RAG augmented queries

No volume discounts available on direct API.

Calculation Example

A customer support application processing 100,000 conversations monthly with average 500 input tokens and 300 output tokens per conversation:

Monthly cost:

Input cost: 100,000 * 500 * $3/1M = $150
Output cost: 100,000 * 300 * $15/1M = $450
Total: $600

Annual cost: $7,200

For reference, upgrading to Claude Sonnet 4.6 costs identical $600/month with better performance.

Usage Patterns Impact on Cost

Cost varies based on input/output ratio (sometimes called prompt/completion ratio):

High input ratio (research analysis, document processing):

2,000 input tokens + 200 output tokens
Input: 2,000 * $3/1M = $0.006
Output: 200 * $15/1M = $0.003
Cost per query: $0.009

Balanced ratio (general conversation):

500 input + 300 output
Input: 500 * $3/1M = $0.0015
Output: 300 * $15/1M = $0.0045
Cost per query: $0.006

High output ratio (content generation):

200 input + 2,000 output
Input: 200 * $3/1M = $0.0006
Output: 2,000 * $15/1M = $0.03
Cost per query: $0.0306

High-output applications cost 3.4x more per query than high-input applications at Anthropic's standard pricing.

AWS Bedrock Pricing

AWS Bedrock provides Claude models through Amazon's managed service, integrating with AWS infrastructure and billing.

Claude 3.5 Sonnet on Bedrock

On-demand pricing:

Input: $3.00 per 1M tokens (same as direct)
Output: $15.00 per 1M tokens (same as direct)

On-demand convenience fee: 0% markup (identical to direct pricing)

This parity with direct pricing makes AWS Bedrock attractive for teams already using AWS infrastructure, despite the lack of cost savings.

Bedrock Batch Processing

AWS Bedrock offers batch API for non-time-sensitive workloads:

Batch pricing:

Input: $0.60 per 1M tokens (80% discount)
Output: $3.00 per 1M tokens (80% discount)

Requirements:

Minimum 10,000 tokens per batch
24-hour turnaround time
Same models and quality as on-demand

When to use batch:

Analyzing 100K documents for insights (suitable for 24hr delay)
Processing historical data for training
Generating embeddings at scale
Content generation for articles/summaries

Cost comparison (100,000 queries at 500 input + 300 output):

On-demand:

Input: 100,000 * 500 * $3/1M = $150
Output: 100,000 * 300 * $15/1M = $450
Total: $600

Batch API:

Input: 100,000 * 500 * $0.60/1M = $30
Output: 100,000 * 300 * $3/1M = $90
Total: $120

Savings: $480/month (80% reduction)

Bedrock Provisioned Throughput

For predictable high-volume workloads, Bedrock offers provisioned throughput:

Provisioned model units (PMUs):

100 PMUs: $1.34/hour = $32.16/month = $385/year for 730 hours/month
Minimum commitment: 1 hour

Each PMU provides consistent throughput capacity independent of number of requests.

When provisioned throughput makes sense:

Consistent >50K requests/month
Predictable traffic patterns
Cost >$1,000/month on on-demand pricing

Example: Processing 1M queries/month at average 300 tokens:

On-demand cost: $300/month
Provisioned cost: $38.40/month (requires minimum commitment)
Provisioned becomes cheaper at >300K queries/month

Google Vertex AI Pricing

Google's Vertex AI provides Claude models integrated with GCP infrastructure.

Claude 3.5 Sonnet on Vertex AI

On-demand pricing:

Input: $3.00 per 1M tokens (same as direct)
Output: $15.00 per 1M tokens (same as direct)

GCP billing integration:

Bundles with GCP services (similar to Bedrock)
Applicable to GCP free credits
Includes GCP's cost analysis and budget alerts

Vertex AI Volume Discounts

Vertex AI provides volume discounts for monthly spend:

Discount tiers:

$100-500/month: 0% discount (on-demand pricing)
$500-1,000/month: 5% discount
$1,000-5,000/month: 10% discount
$5,000-10,000/month: 15% discount
$10,000+/month: 20% discount

Cost example with discounts:

Processing 5M queries/month at 500 input + 300 output tokens:

Raw cost: $3,000/month (input) + $9,000/month (output) = $12,000
Discount tier: 15% (falls in $5,000-10,000 bucket... actually $12,000/month = 20% tier)
Actual cost: $12,000 * 0.80 = $9,600/month
Savings: $2,400/month (20% reduction)

Vertex AI with Commitment Contracts

Vertex AI offers annual commitment contracts with greater discounts:

1-year commitment:

25% discount on list prices
$3,000 minimum monthly commitment
$36,000/year minimum spending

3-year commitment:

35% discount on list prices
$5,000 minimum monthly commitment
$60,000/year minimum spending

Effective pricing with 1-year commitment:

Input: $3.00 * 0.75 = $2.25 per 1M tokens
Output: $15.00 * 0.75 = $11.25 per 1M tokens

Provider Comparison and Optimization

Cost Comparison Summary

Provider	Input	Output	Best For
Anthropic Direct	$3.00	$15.00	Simplicity, no infrastructure
AWS Bedrock (on-demand)	$3.00	$15.00	AWS ecosystem integration
AWS Bedrock (batch)	$0.60	$3.00	Non-time-sensitive workloads
Google Vertex (on-demand)	$3.00	$15.00	GCP infrastructure
Google Vertex (volume)	$2.55	$12.75	High-volume workloads
Google Vertex (1yr commitment)	$2.25	$11.25	Committed monthly spend >$3K
Google Vertex (3yr commitment)	$1.95	$9.75	Committed monthly spend >$5K

Cost Optimization Strategies

Strategy 1: Use batch processing for non-urgent workloads

AWS Bedrock batch provides 80% savings
Suitable for document analysis, content generation, training data
Trade-off: 24-hour turnaround

Strategy 2: Volume commitments for high-volume workloads

Google Vertex 1-year commitment saves 25% at >$36K/year spending
AWS provisioned throughput saves money at >$300/month
Best for stable, predictable traffic

Strategy 3: Route traffic by urgency

Real-time customer interactions: on-demand API
Background jobs: batch processing
Combine approaches for 20-30% average savings

Strategy 4: Upgrade to Claude Sonnet 4.6

Same pricing as 3.5 Sonnet
15-20% better performance
No migration cost (same pricing)
Recommended for all new applications

Cost Analysis Examples

Scenario 1: Startup Customer Support Bot

Requirements: 50,000 customer conversations/month, sub-second latency required

Technology choice: Claude 3.5 Sonnet (2,000 tokens input + 200 tokens output average)

Monthly costs:

Anthropic Direct:

Input: 50,000 * 2,000 * $3/1M = $300
Output: 50,000 * 200 * $15/1M = $150
Total: $450

AWS Bedrock:

Same pricing as direct: $450

Google Vertex (no volume discount yet):

Same pricing as direct: $450

Recommendation: Use Anthropic Direct (simplest billing, no AWS/GCP overhead)

Scenario 2: Content Generation Platform

Requirements: 100,000 blog posts/month, 24-hour turnaround acceptable

Technology choice: Claude 3.5 Sonnet with batch processing (500 input + 2,000 output)

Monthly costs:

AWS Bedrock Batch:

Input: 100,000 * 500 * $0.60/1M = $30
Output: 100,000 * 2,000 * $3/1M = $600
Total: $630

Anthropic Direct (on-demand):

Input: 100,000 * 500 * $3/1M = $150
Output: 100,000 * 2,000 * $15/1M = $3,000
Total: $3,150

Savings with batch: $2,520/month (80% reduction)

Recommendation: Use AWS Bedrock batch processing exclusively

Scenario 3: Production Analytics Platform

Requirements: 2M API calls/month, requires integration with existing GCP infrastructure

Technology choice: Claude 3.5 Sonnet with 1-year Vertex AI commitment (300 input + 500 output average)

Monthly costs:

Monthly spend without commitment:

Input: 2,000,000 * 300 * $3/1M = $1,800
Output: 2,000,000 * 500 * $15/1M = $15,000
Subtotal: $16,800

With volume discount (15% from hitting $10K+ threshold):

$16,800 * 0.85 = $14,280

With 1-year commitment (additional 25% discount):

$16,800 * 0.75 = $12,600

Comparison:

On-demand: $16,800/month
With volume: $14,280/month (15% savings)
With 1-year commitment: $12,600/month (25% savings)
Annual savings with commitment: $50,400

Recommendation: Sign Google Vertex AI 1-year commitment; integrate with existing GCP infrastructure

Migration Path to Sonnet 4.6

Most teams using Claude 3.5 Sonnet should migrate to Claude Sonnet 4.6, which offers identical pricing with superior performance.

Performance Improvements in Sonnet 4.6

15-20% improvement on reasoning tasks
Better coding capability
Improved instruction following
Slightly faster response times
Identical pricing ($3/$15)

Migration Strategy

Step 1: Update model identifier

Change claude-3-5-sonnet-20241022 to claude-sonnet-4-20250514 (or current version)
No other code changes required

Step 2: Run side-by-side tests

Route 10% of traffic to Sonnet 4.6
Compare output quality and latency
Monitor costs (should be identical)

Step 3: Gradual rollout

Increase Sonnet 4.6 traffic to 50%, then 100%
Monitor error rates and user feedback
Maintain 3.5 Sonnet routing for edge cases

Step 4: Sunset legacy model

Once 100% traffic on Sonnet 4.6 for 1 month
Remove 3.5 Sonnet from codebase
Document migration in system architecture

Timing Recommendation

Migrate within 3 months of reading this guide. There's zero cost to migration and the performance improvements compound over time.

Alternatives to Claude 3.5 Sonnet

Understanding competitive options informs better tool selection.

GPT-4 Mini vs Claude 3.5 Sonnet

GPT-4 Mini pricing: $0.15/$0.60 (16x cheaper for input, 25x for output)

Performance comparison:

Reasoning tasks: Claude 3.5 Sonnet 15-20% better
Code generation: Roughly equivalent
Creative writing: Claude slightly better
Mathematical problems: Claude 15-20% better

When GPT-4 Mini makes sense:

Budget-constrained applications (<$100/month)
Classification and simple tasks
Non-critical workloads tolerating lower accuracy

Cost example (100K queries, 500 input + 300 output):

Claude 3.5 Sonnet: $600
GPT-4 Mini: $25.50
Savings: $574.50 (96% reduction)

Gemini 2.5 Pro vs Claude 3.5 Sonnet

Gemini 2.5 Pro pricing: $1.25/$10 ($362.50/month for same workload)

Advantages:

1M context window (vs 200K for Sonnet)
Real-time information access
Multimodal (images, audio, video)

Disadvantages:

Slightly lower quality on reasoning
Less stable for production systems
Fewer integrations

When Gemini makes sense:

Document analysis requiring huge context
Real-time information critical
Multimodal input required

Open-Source Models vs Claude

Popular open-source options:

Llama 3.1-70B: Self-hosted costs $2-3/hour on GPU
Mistral-8x7B: Lower quality, costs $0.5-1/hour
Falcon-40B: Good for specific tasks, costs $1-2/hour

When open-source makes sense:

High-volume inference (>1M queries/month)
Data privacy critical (no external API calls)
Proprietary domain requires fine-tuning
Cost optimization above quality requirements

Cost comparison (1M monthly queries, 500 input + 300 output):

Claude Sonnet: $6,000
Open-source (8h/day utilization on $2/h GPU): $480
Break-even: ~350K queries/month

Advanced Pricing Optimizations

Beyond basic selection, several advanced strategies reduce Claude costs.

Prompt Caching

Anthropic offers prompt caching for repeated prefixes:

First request with cache prefix: Full token count
Subsequent requests reusing prefix: 10% of prefix tokens
Cache invalidation: Automatic after 5 minutes

Applicable scenarios:

Long system prompts (financial regulations, coding guidelines)
RAG with same knowledge base
Batch processing similar documents

Savings calculation: 10K queries daily with 1000-token shared system prompt:

Without caching: 10K * 1000 = 10M system prompt tokens daily
With caching: 1,000 system prompt tokens (cached) + 10K * 0 = negligible
Savings: 99% on system tokens = ~$30/month

Context Window Optimization

Claude's 200K context window enables processing long documents without multiple API calls:

Document analysis (traditional approach):

100K token document
Split into 5 chunks (40K each within model limits)
5 API calls + 5 round trips
Cost: 5 * (40K input + 2K output) = 210K tokens

Single request approach:

100K token document
1 API call with full context
Cost: 1 * (100K input + 2K output) = 102K tokens
Savings: 51%

Asynchronous Batch Processing

For non-urgent processing, batch APIs offer 80% discounts:

Use cases:

Daily document analysis
Email summarization overnight
Training data generation
Log analysis

Cost calculation (1M documents, 100 tokens each):

On-demand: 1M * 100 * $3/1M = $300
Batch: 1M * 100 * $0.60/1M = $60
Savings: $240 per batch

Run daily batch: $60 * 30 days = $1,800/month savings

Multi-Model Strategy

Use cheaper models for simple tasks, expensive models only when needed:

Query routing logic:

Classification tasks: GPT-4 Mini ($0.15/$0.60)
Complex reasoning: Claude 3.5 Sonnet ($3/$15)
Fallback if Mini uncertain: Escalate to Sonnet

Cost distribution (100K queries):

70% simple (classifiable): 70K * GPT-4 Mini = $17.85
30% complex: 30K * Claude Sonnet = $180
Hybrid total: $197.85
vs all Claude: $600
Savings: 67%

Long-Term Cost Planning

Teams should develop cost projections for 3-5 year horizons.

Year 1: Launch and Growth

Month 1-3: MVP phase

Usage: 10K queries/month
Cost: $60/month Claude + $20 infrastructure = $80

Month 4-6: Product-market fit

Usage: 100K queries/month
Cost: $600/month Claude + $100 infrastructure = $700

Month 7-12: Growth phase

Usage: 500K queries/month
Cost: $3,000/month Claude + $500 infrastructure = $3,500
Annual total: ~$8,000

Year 2: Scale

Usage: 5M queries/month Cost:

Base Claude: $30,000/month
Multi-model optimization (30% savings): $21,000/month
Infrastructure scaling: $2,000/month
Total: $23,000/month = $276,000/year

Year 3: Maturity

Usage: 20M queries/month Cost:

Base Claude: $120,000/month
Volume discounts (if negotiated): -$20,000/month
Multi-model + caching (40% optimization): $48,000/month
Infrastructure: $5,000/month
Total: $133,000/month = $1.6M/year

Mitigation strategies at scale:

Negotiate volume discounts (typically 15-20% at $100K+/month)
Build proprietary fine-tuned model for core tasks
Implement aggressive caching and optimization
Consider open-source model fallback

FAQ

Is Claude 3.5 Sonnet deprecated? Yes, as of March 2026. Claude Sonnet 4.6 has superseded it with identical pricing and better performance. No sunset date announced, but Anthropic recommends new applications use Sonnet 4.6.

What's the difference between 3.5 Sonnet and Sonnet 4.6? Sonnet 4.6 is 15-20% better at reasoning and coding tasks with identical pricing. For most applications, upgrade is worthwhile. For cost-optimized systems, 3.5 Sonnet remains viable.

Which provider offers the best 3.5 Sonnet pricing? All providers offer identical on-demand pricing ($3/$15). AWS Bedrock batch (80% discount) is cheapest for non-urgent workloads. Google Vertex 1-year commitments are cheapest for predictable high-volume usage.

Can I use batch processing for all workloads? No, batch processing requires 24-hour turnaround. Use batch for background jobs, document analysis, and non-real-time applications. On-demand required for customer-facing, real-time, sub-second latency applications.

Should I commit to Vertex AI contracts? Only if spending >$3,000/month with stable workloads. Contracts lock you in for 1-3 years. If your usage is growing rapidly, avoid multi-year commitments until growth stabilizes.

What about open-source models instead of 3.5 Sonnet? Open-source Llama-3.1-70B achieves 70-80% of Sonnet 3.5 performance, costs $2-3/hour to run on cloud GPUs. Self-hosting costs money; API access more economical for <1M monthly queries.

Can I use 3.5 Sonnet for production? Yes, it remains stable and reliable. However, new deployments should use Sonnet 4.6 (same price, better performance). For legacy systems, 3.5 Sonnet works indefinitely.

For broader context on Anthropic's pricing and model ecosystem:

Explore Anthropic's complete model pricing
Review Anthropic API pricing details and optimization strategies
Compare Claude 4 pricing versus other models

Sources

Pricing data from Anthropic official API documentation as of March 2026. AWS Bedrock pricing from Amazon Web Services pricing page. Google Vertex AI pricing from Google Cloud pricing calculator. Performance comparisons from MLPerf benchmarks and Anthropic's official model benchmarks. Deprecation timeline based on Anthropic blog announcements and API documentation updates.

Contents

Claude 3.5 Sonnet Pricing: Overview

Model Status and Deprecation Context

Direct Anthropic API Pricing

Claude 3.5 Sonnet Direct Pricing

Calculation Example

Usage Patterns Impact on Cost

AWS Bedrock Pricing

Claude 3.5 Sonnet on Bedrock

Bedrock Batch Processing

Bedrock Provisioned Throughput

Google Vertex AI Pricing

Claude 3.5 Sonnet on Vertex AI

Vertex AI Volume Discounts

Vertex AI with Commitment Contracts

Provider Comparison and Optimization

Cost Comparison Summary

Cost Optimization Strategies

Cost Analysis Examples

Scenario 1: Startup Customer Support Bot

Scenario 2: Content Generation Platform

Scenario 3: Production Analytics Platform

Migration Path to Sonnet 4.6

Performance Improvements in Sonnet 4.6

Migration Strategy

Timing Recommendation

Alternatives to Claude 3.5 Sonnet

GPT-4 Mini vs Claude 3.5 Sonnet

Gemini 2.5 Pro vs Claude 3.5 Sonnet

Open-Source Models vs Claude

Advanced Pricing Optimizations

Prompt Caching

Context Window Optimization

Asynchronous Batch Processing

Multi-Model Strategy

Long-Term Cost Planning

Year 1: Launch and Growth

Year 2: Scale

Year 3: Maturity

FAQ

Related Resources

Sources