Claude 3.5 Sonnet Pricing: Compare Costs Across All API Providers

Deploybase · January 23, 2026 · LLM Pricing

Claude 3.5 Sonnet is legacy as of March 2026 (replaced by Sonnet 4.6). Still matters if developers're running it. Direct Anthropic: $3/$15 per 1M tokens. AWS Bedrock/Vertex AI: way cheaper at volume.

Contents

Claude 3.5 Sonnet Pricing: Overview

Claude 3.5 Sonnet occupied the "best quality to cost" position in Anthropic's model family when released, positioned between the faster Haiku and more capable Opus models. The model has since been deprecated in favor of Claude Sonnet 4.6, which provides significantly better performance at comparable or slightly higher pricing.

As of March 2026, most new applications should use Claude Sonnet 4.6 ($3/$15 per 1M tokens) rather than 3.5 Sonnet. However, legacy systems, existing deployments, and cost-optimized applications still actively use 3.5 Sonnet through various providers. This guide covers current pricing across all providers and migration strategies for teams running legacy models.

The pricing structure reflects Anthropic's broader market positioning: direct API access provides transparent pricing with no volume discounts, while production partnerships through AWS and Google provide volume-based pricing and integration with existing infrastructure.

Model Status and Deprecation Context

Claude 3.5 Sonnet was released in June 2024 as a mid-tier model balancing quality and latency. It became widely adopted for customer-facing applications, internal tools, and development workloads. In March 2025, Anthropic released Claude Sonnet 4.6, which supersedes 3.5 Sonnet in capability while maintaining compatible pricing.

Timeline:

  • June 2024: Claude 3.5 Sonnet released
  • March 2025: Claude Sonnet 4.6 released (supercedes 3.5 Sonnet)
  • Current (March 2026): 3.5 Sonnet still available but deprecated

Deprecation status:

  • No announced sunset date
  • Anthropic maintains API access indefinitely
  • New applications should use Sonnet 4.6
  • Legacy applications using 3.5 can continue without change

The continued availability of deprecated models reflects Anthropic's commitment to backward compatibility. Teams can migrate to Sonnet 4.6 on their timeline without forced migration pressure.

Direct Anthropic API Pricing

Anthropic's direct API provides the baseline pricing all other providers reference or mark up.

Claude 3.5 Sonnet Direct Pricing

Input tokens: $3.00 per 1M tokens Output tokens: $15.00 per 1M tokens

This pricing applies equally to all usage patterns:

  • Text generation
  • Classification and analysis
  • Code generation
  • Complex reasoning
  • RAG augmented queries

No volume discounts available on direct API.

Calculation Example

A customer support application processing 100,000 conversations monthly with average 500 input tokens and 300 output tokens per conversation:

Monthly cost:

  • Input cost: 100,000 * 500 * $3/1M = $150
  • Output cost: 100,000 * 300 * $15/1M = $450
  • Total: $600

Annual cost: $7,200

For reference, upgrading to Claude Sonnet 4.6 costs identical $600/month with better performance.

Usage Patterns Impact on Cost

Cost varies based on input/output ratio (sometimes called prompt/completion ratio):

High input ratio (research analysis, document processing):

  • 2,000 input tokens + 200 output tokens
  • Input: 2,000 * $3/1M = $0.006
  • Output: 200 * $15/1M = $0.003
  • Cost per query: $0.009

Balanced ratio (general conversation):

  • 500 input + 300 output
  • Input: 500 * $3/1M = $0.0015
  • Output: 300 * $15/1M = $0.0045
  • Cost per query: $0.006

High output ratio (content generation):

  • 200 input + 2,000 output
  • Input: 200 * $3/1M = $0.0006
  • Output: 2,000 * $15/1M = $0.03
  • Cost per query: $0.0306

High-output applications cost 3.4x more per query than high-input applications at Anthropic's standard pricing.

AWS Bedrock Pricing

AWS Bedrock provides Claude models through Amazon's managed service, integrating with AWS infrastructure and billing.

Claude 3.5 Sonnet on Bedrock

On-demand pricing:

  • Input: $3.00 per 1M tokens (same as direct)
  • Output: $15.00 per 1M tokens (same as direct)

On-demand convenience fee: 0% markup (identical to direct pricing)

This parity with direct pricing makes AWS Bedrock attractive for teams already using AWS infrastructure, despite the lack of cost savings.

Bedrock Batch Processing

AWS Bedrock offers batch API for non-time-sensitive workloads:

Batch pricing:

  • Input: $0.60 per 1M tokens (80% discount)
  • Output: $3.00 per 1M tokens (80% discount)

Requirements:

  • Minimum 10,000 tokens per batch
  • 24-hour turnaround time
  • Same models and quality as on-demand

When to use batch:

  • Analyzing 100K documents for insights (suitable for 24hr delay)
  • Processing historical data for training
  • Generating embeddings at scale
  • Content generation for articles/summaries

Cost comparison (100,000 queries at 500 input + 300 output):

On-demand:

  • Input: 100,000 * 500 * $3/1M = $150
  • Output: 100,000 * 300 * $15/1M = $450
  • Total: $600

Batch API:

  • Input: 100,000 * 500 * $0.60/1M = $30
  • Output: 100,000 * 300 * $3/1M = $90
  • Total: $120

Savings: $480/month (80% reduction)

Bedrock Provisioned Throughput

For predictable high-volume workloads, Bedrock offers provisioned throughput:

Provisioned model units (PMUs):

  • 100 PMUs: $1.34/hour = $32.16/month = $385/year for 730 hours/month
  • Minimum commitment: 1 hour

Each PMU provides consistent throughput capacity independent of number of requests.

When provisioned throughput makes sense:

  • Consistent >50K requests/month
  • Predictable traffic patterns
  • Cost >$1,000/month on on-demand pricing

Example: Processing 1M queries/month at average 300 tokens:

  • On-demand cost: $300/month
  • Provisioned cost: $38.40/month (requires minimum commitment)
  • Provisioned becomes cheaper at >300K queries/month

Google Vertex AI Pricing

Google's Vertex AI provides Claude models integrated with GCP infrastructure.

Claude 3.5 Sonnet on Vertex AI

On-demand pricing:

  • Input: $3.00 per 1M tokens (same as direct)
  • Output: $15.00 per 1M tokens (same as direct)

GCP billing integration:

  • Bundles with GCP services (similar to Bedrock)
  • Applicable to GCP free credits
  • Includes GCP's cost analysis and budget alerts

Vertex AI Volume Discounts

Vertex AI provides volume discounts for monthly spend:

Discount tiers:

  • $100-500/month: 0% discount (on-demand pricing)
  • $500-1,000/month: 5% discount
  • $1,000-5,000/month: 10% discount
  • $5,000-10,000/month: 15% discount
  • $10,000+/month: 20% discount

Cost example with discounts:

Processing 5M queries/month at 500 input + 300 output tokens:

  • Raw cost: $3,000/month (input) + $9,000/month (output) = $12,000
  • Discount tier: 15% (falls in $5,000-10,000 bucket... actually $12,000/month = 20% tier)
  • Actual cost: $12,000 * 0.80 = $9,600/month
  • Savings: $2,400/month (20% reduction)

Vertex AI with Commitment Contracts

Vertex AI offers annual commitment contracts with greater discounts:

1-year commitment:

  • 25% discount on list prices
  • $3,000 minimum monthly commitment
  • $36,000/year minimum spending

3-year commitment:

  • 35% discount on list prices
  • $5,000 minimum monthly commitment
  • $60,000/year minimum spending

Effective pricing with 1-year commitment:

  • Input: $3.00 * 0.75 = $2.25 per 1M tokens
  • Output: $15.00 * 0.75 = $11.25 per 1M tokens

Provider Comparison and Optimization

Cost Comparison Summary

ProviderInputOutputBest For
Anthropic Direct$3.00$15.00Simplicity, no infrastructure
AWS Bedrock (on-demand)$3.00$15.00AWS ecosystem integration
AWS Bedrock (batch)$0.60$3.00Non-time-sensitive workloads
Google Vertex (on-demand)$3.00$15.00GCP infrastructure
Google Vertex (volume)$2.55$12.75High-volume workloads
Google Vertex (1yr commitment)$2.25$11.25Committed monthly spend >$3K
Google Vertex (3yr commitment)$1.95$9.75Committed monthly spend >$5K

Cost Optimization Strategies

Strategy 1: Use batch processing for non-urgent workloads

  • AWS Bedrock batch provides 80% savings
  • Suitable for document analysis, content generation, training data
  • Trade-off: 24-hour turnaround

Strategy 2: Volume commitments for high-volume workloads

  • Google Vertex 1-year commitment saves 25% at >$36K/year spending
  • AWS provisioned throughput saves money at >$300/month
  • Best for stable, predictable traffic

Strategy 3: Route traffic by urgency

  • Real-time customer interactions: on-demand API
  • Background jobs: batch processing
  • Combine approaches for 20-30% average savings

Strategy 4: Upgrade to Claude Sonnet 4.6

  • Same pricing as 3.5 Sonnet
  • 15-20% better performance
  • No migration cost (same pricing)
  • Recommended for all new applications

Cost Analysis Examples

Scenario 1: Startup Customer Support Bot

Requirements: 50,000 customer conversations/month, sub-second latency required

Technology choice: Claude 3.5 Sonnet (2,000 tokens input + 200 tokens output average)

Monthly costs:

Anthropic Direct:

  • Input: 50,000 * 2,000 * $3/1M = $300
  • Output: 50,000 * 200 * $15/1M = $150
  • Total: $450

AWS Bedrock:

  • Same pricing as direct: $450

Google Vertex (no volume discount yet):

  • Same pricing as direct: $450

Recommendation: Use Anthropic Direct (simplest billing, no AWS/GCP overhead)

Scenario 2: Content Generation Platform

Requirements: 100,000 blog posts/month, 24-hour turnaround acceptable

Technology choice: Claude 3.5 Sonnet with batch processing (500 input + 2,000 output)

Monthly costs:

AWS Bedrock Batch:

  • Input: 100,000 * 500 * $0.60/1M = $30
  • Output: 100,000 * 2,000 * $3/1M = $600
  • Total: $630

Anthropic Direct (on-demand):

  • Input: 100,000 * 500 * $3/1M = $150
  • Output: 100,000 * 2,000 * $15/1M = $3,000
  • Total: $3,150

Savings with batch: $2,520/month (80% reduction)

Recommendation: Use AWS Bedrock batch processing exclusively

Scenario 3: Production Analytics Platform

Requirements: 2M API calls/month, requires integration with existing GCP infrastructure

Technology choice: Claude 3.5 Sonnet with 1-year Vertex AI commitment (300 input + 500 output average)

Monthly costs:

Monthly spend without commitment:

  • Input: 2,000,000 * 300 * $3/1M = $1,800
  • Output: 2,000,000 * 500 * $15/1M = $15,000
  • Subtotal: $16,800

With volume discount (15% from hitting $10K+ threshold):

  • $16,800 * 0.85 = $14,280

With 1-year commitment (additional 25% discount):

  • $16,800 * 0.75 = $12,600

Comparison:

  • On-demand: $16,800/month
  • With volume: $14,280/month (15% savings)
  • With 1-year commitment: $12,600/month (25% savings)
  • Annual savings with commitment: $50,400

Recommendation: Sign Google Vertex AI 1-year commitment; integrate with existing GCP infrastructure

Migration Path to Sonnet 4.6

Most teams using Claude 3.5 Sonnet should migrate to Claude Sonnet 4.6, which offers identical pricing with superior performance.

Performance Improvements in Sonnet 4.6

  • 15-20% improvement on reasoning tasks
  • Better coding capability
  • Improved instruction following
  • Slightly faster response times
  • Identical pricing ($3/$15)

Migration Strategy

Step 1: Update model identifier

  • Change claude-3-5-sonnet-20241022 to claude-sonnet-4-20250514 (or current version)
  • No other code changes required

Step 2: Run side-by-side tests

  • Route 10% of traffic to Sonnet 4.6
  • Compare output quality and latency
  • Monitor costs (should be identical)

Step 3: Gradual rollout

  • Increase Sonnet 4.6 traffic to 50%, then 100%
  • Monitor error rates and user feedback
  • Maintain 3.5 Sonnet routing for edge cases

Step 4: Sunset legacy model

  • Once 100% traffic on Sonnet 4.6 for 1 month
  • Remove 3.5 Sonnet from codebase
  • Document migration in system architecture

Timing Recommendation

Migrate within 3 months of reading this guide. There's zero cost to migration and the performance improvements compound over time.

Alternatives to Claude 3.5 Sonnet

Understanding competitive options informs better tool selection.

GPT-4 Mini vs Claude 3.5 Sonnet

GPT-4 Mini pricing: $0.15/$0.60 (16x cheaper for input, 25x for output)

Performance comparison:

  • Reasoning tasks: Claude 3.5 Sonnet 15-20% better
  • Code generation: Roughly equivalent
  • Creative writing: Claude slightly better
  • Mathematical problems: Claude 15-20% better

When GPT-4 Mini makes sense:

  • Budget-constrained applications (<$100/month)
  • Classification and simple tasks
  • Non-critical workloads tolerating lower accuracy

Cost example (100K queries, 500 input + 300 output):

  • Claude 3.5 Sonnet: $600
  • GPT-4 Mini: $25.50
  • Savings: $574.50 (96% reduction)

Gemini 2.5 Pro vs Claude 3.5 Sonnet

Gemini 2.5 Pro pricing: $1.25/$10 ($362.50/month for same workload)

Advantages:

  • 1M context window (vs 200K for Sonnet)
  • Real-time information access
  • Multimodal (images, audio, video)

Disadvantages:

  • Slightly lower quality on reasoning
  • Less stable for production systems
  • Fewer integrations

When Gemini makes sense:

  • Document analysis requiring huge context
  • Real-time information critical
  • Multimodal input required

Open-Source Models vs Claude

Popular open-source options:

  • Llama 3.1-70B: Self-hosted costs $2-3/hour on GPU
  • Mistral-8x7B: Lower quality, costs $0.5-1/hour
  • Falcon-40B: Good for specific tasks, costs $1-2/hour

When open-source makes sense:

  • High-volume inference (>1M queries/month)
  • Data privacy critical (no external API calls)
  • Proprietary domain requires fine-tuning
  • Cost optimization above quality requirements

Cost comparison (1M monthly queries, 500 input + 300 output):

  • Claude Sonnet: $6,000
  • Open-source (8h/day utilization on $2/h GPU): $480
  • Break-even: ~350K queries/month

Advanced Pricing Optimizations

Beyond basic selection, several advanced strategies reduce Claude costs.

Prompt Caching

Anthropic offers prompt caching for repeated prefixes:

  • First request with cache prefix: Full token count
  • Subsequent requests reusing prefix: 10% of prefix tokens
  • Cache invalidation: Automatic after 5 minutes

Applicable scenarios:

  • Long system prompts (financial regulations, coding guidelines)
  • RAG with same knowledge base
  • Batch processing similar documents

Savings calculation: 10K queries daily with 1000-token shared system prompt:

  • Without caching: 10K * 1000 = 10M system prompt tokens daily
  • With caching: 1,000 system prompt tokens (cached) + 10K * 0 = negligible
  • Savings: 99% on system tokens = ~$30/month

Context Window Optimization

Claude's 200K context window enables processing long documents without multiple API calls:

Document analysis (traditional approach):

  • 100K token document
  • Split into 5 chunks (40K each within model limits)
  • 5 API calls + 5 round trips
  • Cost: 5 * (40K input + 2K output) = 210K tokens

Single request approach:

  • 100K token document
  • 1 API call with full context
  • Cost: 1 * (100K input + 2K output) = 102K tokens
  • Savings: 51%

Asynchronous Batch Processing

For non-urgent processing, batch APIs offer 80% discounts:

Use cases:

  • Daily document analysis
  • Email summarization overnight
  • Training data generation
  • Log analysis

Cost calculation (1M documents, 100 tokens each):

  • On-demand: 1M * 100 * $3/1M = $300
  • Batch: 1M * 100 * $0.60/1M = $60
  • Savings: $240 per batch

Run daily batch: $60 * 30 days = $1,800/month savings

Multi-Model Strategy

Use cheaper models for simple tasks, expensive models only when needed:

Query routing logic:

  • Classification tasks: GPT-4 Mini ($0.15/$0.60)
  • Complex reasoning: Claude 3.5 Sonnet ($3/$15)
  • Fallback if Mini uncertain: Escalate to Sonnet

Cost distribution (100K queries):

  • 70% simple (classifiable): 70K * GPT-4 Mini = $17.85
  • 30% complex: 30K * Claude Sonnet = $180
  • Hybrid total: $197.85
  • vs all Claude: $600
  • Savings: 67%

Long-Term Cost Planning

teams should develop cost projections for 3-5 year horizons.

Year 1: Launch and Growth

Month 1-3: MVP phase

  • Usage: 10K queries/month
  • Cost: $60/month Claude + $20 infrastructure = $80

Month 4-6: Product-market fit

  • Usage: 100K queries/month
  • Cost: $600/month Claude + $100 infrastructure = $700

Month 7-12: Growth phase

  • Usage: 500K queries/month
  • Cost: $3,000/month Claude + $500 infrastructure = $3,500
  • Annual total: ~$8,000

Year 2: Scale

Usage: 5M queries/month Cost:

  • Base Claude: $30,000/month
  • Multi-model optimization (30% savings): $21,000/month
  • Infrastructure scaling: $2,000/month
  • Total: $23,000/month = $276,000/year

Year 3: Maturity

Usage: 20M queries/month Cost:

  • Base Claude: $120,000/month
  • Volume discounts (if negotiated): -$20,000/month
  • Multi-model + caching (40% optimization): $48,000/month
  • Infrastructure: $5,000/month
  • Total: $133,000/month = $1.6M/year

Mitigation strategies at scale:

  • Negotiate volume discounts (typically 15-20% at $100K+/month)
  • Build proprietary fine-tuned model for core tasks
  • Implement aggressive caching and optimization
  • Consider open-source model fallback

FAQ

Is Claude 3.5 Sonnet deprecated? Yes, as of March 2026. Claude Sonnet 4.6 has superseded it with identical pricing and better performance. No sunset date announced, but Anthropic recommends new applications use Sonnet 4.6.

What's the difference between 3.5 Sonnet and Sonnet 4.6? Sonnet 4.6 is 15-20% better at reasoning and coding tasks with identical pricing. For most applications, upgrade is worthwhile. For cost-optimized systems, 3.5 Sonnet remains viable.

Which provider offers the best 3.5 Sonnet pricing? All providers offer identical on-demand pricing ($3/$15). AWS Bedrock batch (80% discount) is cheapest for non-urgent workloads. Google Vertex 1-year commitments are cheapest for predictable high-volume usage.

Can I use batch processing for all workloads? No, batch processing requires 24-hour turnaround. Use batch for background jobs, document analysis, and non-real-time applications. On-demand required for customer-facing, real-time, sub-second latency applications.

Should I commit to Vertex AI contracts? Only if spending >$3,000/month with stable workloads. Contracts lock you in for 1-3 years. If your usage is growing rapidly, avoid multi-year commitments until growth stabilizes.

What about open-source models instead of 3.5 Sonnet? Open-source Llama-3.1-70B achieves 70-80% of Sonnet 3.5 performance, costs $2-3/hour to run on cloud GPUs. Self-hosting costs money; API access more economical for <1M monthly queries.

Can I use 3.5 Sonnet for production? Yes, it remains stable and reliable. However, new deployments should use Sonnet 4.6 (same price, better performance). For legacy systems, 3.5 Sonnet works indefinitely.

For broader context on Anthropic's pricing and model ecosystem:

Sources

Pricing data from Anthropic official API documentation as of March 2026. AWS Bedrock pricing from Amazon Web Services pricing page. Google Vertex AI pricing from Google Cloud pricing calculator. Performance comparisons from MLPerf benchmarks and Anthropic's official model benchmarks. Deprecation timeline based on Anthropic blog announcements and API documentation updates.