Contents
- Historical Pricing Timeline
- Cost Per Token Trends
- Provider Competition Impact
- Current Pricing (March 2026)
- Future Projections
- FAQ
- Related Resources
- Sources
Historical Pricing Timeline
Cost per Token Trends is the focus of this guide. LLM API pricing has decreased dramatically since GPT-3's launch in 2020. Understanding this trajectory informs cost projections for businesses evaluating AI investments.
2020-2021: GPT-3 API pricing started at $0.006 per 1,000 prompt tokens and $0.012 per 1,000 completion tokens. Teams considered API calls expensive compared to local inference.
2022: Pricing remained relatively stable through early 2022. Major cloud providers launched GPU infrastructure, creating direct competition with API vendors.
2023: First significant price reductions appeared. OpenAI introduced GPT-3.5 at $0.0015 per prompt token, approximately 75% lower than GPT-3. Meta released Llama 2, spurring open-source adoption.
2024: Cost-per-token continued declining. Anthropic's Claude models decreased 20% mid-year. Google released Gemini with competitive pricing.
2025-2026: Current pricing reflects commoditization of LLM inference. GPT-4o costs $2.50 per 1M prompt tokens, while newer frontier models like GPT-5 and Gemini 2.5 Pro have pushed input pricing to $1.25/M — an 80%+ reduction from early 2023 GPT-4 rates.
Cost Per Token Trends
Absolute Price Decreases
OpenAI's pricing evolution demonstrates the trend:
- GPT-3 (2020): $60.00 per 1M prompt tokens (equivalent; billed at $0.06/1K)
- GPT-3.5-Turbo (2023): $1.50 per 1M prompt tokens
- GPT-4 Turbo (early 2024): $10.00 per 1M prompt tokens
- GPT-4o (late 2024): $5.00 per 1M prompt tokens
- GPT-4o (March 2026): $2.50 per 1M prompt tokens
- GPT-5 (March 2026): $1.25 per 1M prompt tokens
This represents a ~98% price reduction for comparable capability over six years.
Competitive Market Impact
Pricing pressure increased substantially with open-source model availability. Llama 3 availability through API providers like Together AI and Replicate forced proprietary vendors to reduce costs.
As of March 2026, API pricing for equivalent model capabilities shows:
- Frontier proprietary models (GPT-5, Claude Sonnet, Gemini 2.5 Pro): $1.25–$3.00/M input tokens
- Mid-tier proprietary models (GPT-4o, Gemini Flash): $0.30–$2.50/M input tokens
- Open-source through commercial APIs (DeepSeek V3, Mistral, Llama): $0.14–$0.55/M input tokens
- Self-hosted inference: $0.001–$0.01/M input tokens equivalent via compute costs
The price-to-capability ratio has shifted dramatically in favor of end users.
Volume-Based Discounts
All major providers implemented aggressive volume discounts:
OpenAI: 10% discount at 100M monthly tokens, 20% at 1B monthly tokens
Anthropic: 15% discount at 500M monthly tokens
Google Cloud: 20% discount at 1B monthly tokens through commitments
High-volume users consuming over 1B tokens monthly benefit most from competitive pricing.
Provider Competition Impact
Multiple factors drove pricing reductions:
Open-Source Competition: Llama, Mistral, and other models eliminated artificial scarcity. Proprietary vendors could no longer command premium pricing.
Cloud Provider Entry: AWS Bedrock, Google Cloud Vertex AI, and Azure OpenAI Service added pricing pressure through volume-based discounts.
Inference Optimization: Quantization, distillation, and knowledge distillation techniques reduced computational requirements, lowering operating costs for providers.
Capital Availability: Competitive funding rounds encouraged aggressive pricing to capture market share.
Current Pricing (March 2026)
Model-specific pricing across major providers as of March 2026:
GPT-4o: $2.50 per 1M prompt tokens, $10.00 per 1M completion tokens
GPT-5: $1.25 per 1M prompt tokens, $10.00 per 1M completion tokens
Claude Sonnet 4.6: $3.00 per 1M prompt tokens, $15.00 per 1M completion tokens
Gemini 2.5 Pro: $1.25 per 1M prompt tokens, $10.00 per 1M completion tokens
DeepSeek V3: $0.27 per 1M prompt tokens, $1.10 per 1M completion tokens
Llama 3.1 (via API, e.g. Together AI): $0.18 per 1M prompt tokens, $0.90 per 1M completion tokens
Self-hosted inference through cloud GPUs offers lower absolute costs for high-volume users. Reference LLM API pricing for comprehensive provider comparison and current rates.
Future Projections
Pricing will likely decrease 30-50% over the next 2-3 years based on current trajectories:
Hardware Innovation: New GPU architectures offering 2-3x throughput improvements reduce per-token serving costs.
Model Efficiency: Mixture-of-Experts architectures and sparse computation reduce computational intensity.
Market Saturation: Commoditization of LLM inference drives margin compression across providers.
Teams implementing long-term contracts now secure favorable rates before further declines emerge.
FAQ
Why has LLM API pricing dropped so dramatically? Increased competition from open-source models, cloud provider entry, and inference optimization techniques all contributed. Model capabilities now outpace pricing reductions, meaning tokens deliver more value despite lower costs.
Should we lock in pricing contracts now? If consuming over 500M tokens monthly, securing 12-24 month commitments ensures price stability. Smaller users benefit from monitoring quarterly pricing changes rather than long commitments.
Are self-hosted models cheaper than API calls? For workloads consuming over 1-2B tokens monthly, self-hosted inference becomes cost-competitive. For sporadic usage, APIs remain more economical due to no infrastructure overhead.
Do different models have different cost trajectories? Yes. Open-source model pricing decreased faster than proprietary offerings. Proprietary vendors maintain 2-5x price premium for latest capabilities.
What's the cost difference between prompt and completion tokens? Completion tokens typically cost 4–5x prompt tokens due to the sequential autoregressive generation process. GPT-4o charges $2.50/$10 (4x), Claude Sonnet charges $3/$15 (5x). Earlier models (2022–2023) had a lower 2x ratio, but the gap has widened as providers optimized prompt processing.
Related Resources
Review comprehensive pricing information in LLM API pricing guide. Explore cost optimization with inference optimization techniques. Learn about self-hosted alternatives and their economics using AI cost calculator.
Understand GPU pricing fundamentals through spot GPU pricing and GPU cloud cost comparison to evaluate self-hosting economics.
Sources
- OpenAI Pricing History: https://openai.com/pricing/
- Anthropic Claude Pricing: https://www.anthropic.com/pricing
- Google Cloud Vertex AI Pricing: https://cloud.google.com/vertex-ai/pricing