LLM API Price War: How Costs Dropped 90% in 18 Months

Deploybase · July 3, 2025 · Market Analysis

Contents

The 90% Price Drop in 18 Months

GPT-4: dropped from $0.03/1K to $0.005/1K. Claude: $0.015 to $0.003. Mistral: similar. Market is brutal.

Historical Pricing Timeline

January 2024: OpenAI GPT-4: $0.03 input / $0.06 output per 1K tokens Claude 2: $0.008 input / $0.024 output Mistral Large: $0.024 input / $0.072 output

July 2024: OpenAI GPT-4 Turbo: $0.01 input / $0.03 output (67% price cut) Claude 3 Opus: $0.015 input / $0.075 output Mistral Medium: $0.007 input / $0.021 output

January 2025: OpenAI GPT-4 Turbo: $0.005 input / $0.015 output (50% additional cut) Claude 3.5 Sonnet: $0.003 input / $0.015 output (80% price cut) Mistral Small: $0.0014 input / $0.0042 output (80% price cut)

March 2026 (Current): OpenAI API pricing: GPT-4o at $0.0025 input, $0.010 output Anthropic API pricing: Claude Sonnet 4.6 at $0.003 input, $0.015 output Mistral API pricing: Mistral Large at $0.003 input, $0.009 output DeepSeek API pricing: DeepSeek-V3.2 at $0.00028 input, $0.00042 output

Cumulative pricing change: 90% reduction over 18 months.

Pricing Comparison Matrix

ModelProviderInput (per 1K tokens)Output (per 1K tokens)Effective Cost
GPT-4oOpenAI$0.0025$0.010$0.0075 avg
Claude Sonnet 4.6Anthropic$0.003$0.015$0.011 avg
Mistral LargeMistral$0.003$0.009$0.007 avg
DeepSeek-V3.2DeepSeek$0.00028$0.00042$0.00037 avg

Cost per million tokens:

GPT-4o: $12.50 Claude Sonnet 4.6: $18 Mistral Large: $12 DeepSeek-V3.2: $0.70

Root Causes of Price Compression

Overcapacity: By late 2024, multiple models matched OpenAI quality. Excess capacity drove prices down as providers competed for market share.

Efficiency Improvements: Inference optimization reduced hardware costs. Quantized models achieved comparable quality at 75% lower compute cost.

Model Scaling Economics: Larger models became more efficient. Training cost per token improved 50-70% year-over-year.

Open-Source Competition: Open models like Mistral, Llama, and Deepseek forced API prices lower. Open-source models eliminated moats based on quality differences.

Cost Transparency: As cost structures leaked, customers negotiated aggressively. Providers couldn't maintain 400% markups when alternatives existed.

Venture Capital: New entrants (DeepSeek, others) funded by venture money underprice to gain market share. VC models emphasize growth over profitability, pressuring incumbents.

Margin Analysis

January 2024: OpenAI GPT-4 likely cost $0.005/1K tokens to serve. Selling at $0.03 input represented 600% markup.

March 2026: OpenAI GPT-4o costs ~$0.002/1K tokens due to improved efficiency. Selling at $0.005 represents 250% markup.

Dramatic margin compression despite improved efficiency. Infrastructure costs fell faster than pricing fell. Currently, margins compress toward marginal cost floor.

Future: Expect convergence toward 50-100% markup (true profit margin accounting for development costs).

Cost Structure Insights

Hardware: $0.0005-0.0015 per 1K tokens for inference (varies by model size and quantization)

Personnel: Significant development costs amortized across usage. As usage scales, per-token cost drops.

Infrastructure: Cooling, bandwidth, storage relatively small. Compute dominates.

Debt service: Venture funding requires profitability targets. API pricing structured to hit ROI timelines.

Competitive dynamics: Pricing set by competition, not cost. When DeepSeek underprices by 50%, incumbents must match to retain customers.

Profitability Concerns

March 2026: Most providers operate below 20% gross margins. Some models sold near marginal cost.

Sustainability questioned: Venture-backed companies burn cash to gain market share. Profitability secondary to growth.

OpenAI exception: Dominates market share, maintains reasonable margins. Can sustain price cuts through efficiency gains.

Smaller providers: Llama 2 API providers (Together AI, Replicate) offer competitive pricing but struggle with unit economics.

Future Pricing Predictions

2026-2027: Expect 20-30% additional price declines as:

  • Inference efficiency improves further
  • Hardware costs fall
  • Competition intensifies

Floor: Prices approach $0.0001 per 1K tokens for commodity models. Professional services (guaranteed SLA, priority throughput) maintain 2-3x premium.

Differentiation: Quality models maintain higher prices. DeepSeek-V3 at $0.0007 cheaper than GPT-4o at $0.012, but customers paying for predictability and quality tradeoff.

Tiered pricing: Expect providers to offer ultra-cheap commodity models alongside premium offerings. GPT-4o quality stays at $0.005; GPT-4o mini falls to $0.0001.

Strategic Implications

Cost-aware developers should:

  1. Use cheapest appropriate model. DeepSeek-V3 at 70% of Claude pricing performs comparably for most tasks.

  2. Negotiate volume discounts. Usage above 100M tokens/month merits 20-50% discounts directly with providers.

  3. Consider self-hosting. Open-source LLM inference at $0.00005-0.0002 per token on RunPod beats all cloud APIs for high-volume applications.

  4. Monitor pricing weekly. Prices change monthly. Historical prices from 3 months ago are outdated.

  5. Lock long-term contracts. Providers offer 30-50% discounts for annual commitments when budget certainty exists.

API versus Self-Hosting Economics

APIs: Simple, managed, but expensive at scale. Good for <1M tokens/day usage.

Self-hosting: Requires infrastructure expertise, but 10-100x cheaper at large scale. Good for >10M tokens/day usage.

Break-even: 5-10M tokens/day depending on model and provider. Lambda GPU pricing for H100 SXM ($3.78/hour) supports 2-3M tokens/day, costing $60-90/day. API costs for Mistral Large: 5M tokens at $0.0007 = $3.50, H100 at same throughput = $7.47. API wins below 3M tokens/day.

FAQ

Will prices stabilize or continue falling? Likely 20-30% decline through 2027. After that, stabilization expected as providers can't operate below marginal cost indefinitely. Market maturation should stabilize pricing by 2028.

Is OpenAI still worth the premium despite higher prices? For reliability and support, yes. For pure cost optimization, DeepSeek-V3 matches quality at 58% lower price. Choose based on reliability requirements, not purely cost.

Should I lock in long-term contracts now? Yes, if budget certainty important. Lock 30-50% discounts now. Prices may fall 20% more, but discount contracts protect against unexpected costs.

Which provider has best price-quality ratio? DeepSeek-V3 (best cost), Claude Sonnet 4.6 (best quality-to-cost), Mistral Large (good all-around). OpenAI GPT-4o best for critical applications.

Will API pricing ever compete with self-hosting? No. APIs will always cost 10-50x more than self-hosting at large scale due to managed overhead. APIs win on simplicity, self-hosting wins on cost.

What caused the price drops specifically? Overcapacity, efficiency improvements, open-source competition, and venture-backed competitors burning cash. All pressures point downward.

Are margins sustainable for providers? Questionable. Most operate below 20% gross margin. Unsustainable long-term. Expect consolidation as smaller providers shut down.

GPU Cloud Pricing Trends:Are GPUs Getting Cheaper? Open-Source LLM Inference:Cheapest Hosting Options Best GPU Cloud for LLM Training:Provider and Pricing

Sources

OpenAI pricing documentation Anthropic API pricing page Mistral pricing page DeepSeek API pricing Industry pricing surveys and analysis reports Historical pricing data 2024-2026