LLM Pricing History: How Costs Dropped 99% Since 2023

LLM Pricing History: 2023 Pricing Baseline
2024 Price Competition
2025-2026 Consolidation
Cost Drivers Analysis
Future Projections
FAQ
Related Resources
Sources

LLM Pricing History: 2023 Pricing Baseline

LLM Pricing History is the focus of this guide. LLM APIs were expensive. OpenAI had premium pricing because they had the only game in town.

OpenAI GPT-3.5 Turbo (January 2023)

Input: $0.0015 per 1K tokens
Output: $0.002 per 1K tokens
Typical inference cost: $0.30-$1.00 per request

GPT-4 (March 2023 release)

Input: $0.03 per 1K tokens
Output: $0.06 per 1K tokens
Cost per typical request: $2-$5

GPT-4 represented 20-50x price premium over GPT-3.5. Adoption limited to high-value applications (legal analysis, medical diagnosis, complex coding).

Anthropic Claude 1 (March 2023)

Input: $0.01 per 1K tokens
Output: $0.03 per 1K tokens

Positioned as premium alternative to GPT-4 with comparable pricing. Limited API availability. Waitlist of 100K+ users.

Market Conditions

Only three viable API providers existed. Barriers to entry were high:

Massive inference infrastructure costs ($100M+)
Limited trained LLM talent pool
Compute hardware scarcity (H100 shortage acute)
Network effects favoring first movers (OpenAI)

Result: Oligopoly pricing. Limited competition meant pricing floors well above marginal cost.

2024 Price Competition

Market entry accelerated through 2024. New providers used open-source models and improved efficiency.

Q1 2024: Gemini Launch

Google released Gemini API with aggressive pricing:

Gemini Pro Input: $0.00025 per 1K tokens
Gemini Pro Output: $0.0005 per 1K tokens

120x cheaper than GPT-4. Forced OpenAI pricing pressure.

Q2 2024: Claude 2 Launch

Anthropic launched Claude 2 with improved capabilities but surprisingly lower pricing:

Input: $0.008 per 1K tokens (20% reduction)
Output: $0.024 per 1K tokens (20% reduction)

Signaled confidence in scale and efficiency gains.

Q3 2024: Open-Source Movement

Meta released Llama 2 for commercial use. Mistral released Mistral 7B and Mixtral 8x7B. Open-source models became viable for production workloads.

Pricing implications:

RunPod H100 inference: $0.0005 per 1K tokens (cost-based pricing)
Together AI Mixtral: $0.0002 per 1K tokens
Groq Mixtral: $0.0001 per 1K tokens

Open-source inference showed pricing potential if efficiency improved.

Q4 2024: OpenAI Response

OpenAI launched GPT-4 Turbo with lower pricing to combat competition:

Input: $0.01 per 1K tokens (67% reduction from GPT-4)
Output: $0.03 per 1K tokens (50% reduction from GPT-4)

Still 50x more expensive than Gemini but closer to market rates. Signaled pricing war beginning.

Year-End 2024 Summary

Provider	Original 2023 Price	Q4 2024 Price	Reduction
GPT-4	$0.03	$0.01	67%
Claude 1	$0.01	$0.008	20%
Gemini	N/A	$0.00025	N/A
Llama 2	N/A	$0.0005	N/A

Market fractured. Premium models (GPT-4, Claude) maintained higher prices. Commodity models (Llama, Mistral) compressed to near-marginal cost.

2025-2026 Consolidation

Pricing consolidation accelerated through 2025 toward equilibrium pricing.

Q1 2025: Claude 3 Launch

Anthropic released Claude 3 family with category-based pricing:

Claude 3 Haiku:

Input: $0.00025 per 1K tokens
Output: $0.00125 per 1K tokens

Claude 3 Sonnet:

Input: $0.003 per 1K tokens
Output: $0.015 per 1K tokens

Claude 3 Opus:

Input: $0.015 per 1K tokens
Output: $0.075 per 1K tokens

Anthropic's tiered approach acknowledged cost-capability tradeoff. Cheaper Haiku variant matched Gemini pricing.

Q2 2025: OpenAI Counter

OpenAI reduced GPT-3.5 Turbo pricing and released GPT-4 Mini:

GPT-3.5 Turbo:

Input: $0.0005 per 1K tokens (67% reduction)
Output: $0.0015 per 1K tokens (25% reduction)

GPT-4 Mini (new):

Input: $0.00015 per 1K tokens
Output: $0.0006 per 1K tokens

GPT-4 Mini positioned as Haiku competitor with lower pricing.

Q3 2025: Extended Ecosystem

Specialized providers emerged:

Groq Speed-focused: $0.00001-$0.0001 per token (max speed) Fireworks AI Cost-focused: $0.000005-$0.00001 per token LM Studio (open-source): $0.00 per token (self-hosted)

Competition focused on efficiency and specialization rather than general models.

Q4 2025 - Q1 2026: Current State

Prices stabilized at new equilibrium:

Model	Input Price (per 1K tokens, March 2026)	YoY Change
GPT-4o	$0.0025 ($2.50/M)	-92% vs GPT-4 launch 2023
GPT-5	$0.00125 ($1.25/M)	-96% vs GPT-4 launch 2023
Claude Sonnet 4.6	$0.003 ($3.00/M)	-70% vs Claude 1 Opus 2023
Claude Haiku 4.5	$0.001 ($1.00/M)	-97% vs GPT-4 launch 2023
Gemini 2.5 Flash	$0.0003 ($0.30/M)	-99% vs GPT-4 launch 2023
DeepSeek V3	$0.00027 ($0.27/M)	-99% vs GPT-4 launch 2023
Open-source via API	$0.0001-$0.001/1K	N/A

Effective pricing: 95–99% decline from 2023 GPT-4 rates, depending on model tier.

Cost Drivers Analysis

Hardware Cost Reduction

NVIDIA H100 cost dropped from $40K (2023) to $15K (2026) through manufacturing scale. This 62% hardware cost reduction cascaded to API pricing.

NVIDIA H200 and B200 offered improved capabilities at better cost-per-token metrics.

Competition from AMD MI250/MI300 forced price competition.

Model Efficiency

LLM efficiency doubled from 2023-2026:

Quantization techniques (4-bit, INT8) reduce compute 50-75% with minimal quality loss
Architecture improvements (grouped query attention, flash attention) improve throughput 40-60%
Knowledge distillation to smaller models trade capability for speed

A 7B model in 2026 outperforms 13B model from 2023. Smaller models cost less to serve.

Competitive Saturation

2023: 3 viable API providers (OpenAI, Anthropic, Google) 2026: 20+ viable providers competing on price/capability

Lower barriers to entry through:

Availability of open-source models (Llama, Mistral, Qwen)
Accessible inference platforms (Together, Baseten, Modal)
Commodity GPU cloud (RunPod, Lambda, Vast.AI)

Competition compresses margins toward marginal cost.

Inference Optimization

Inference stack maturation enabled 3-5x throughput improvements:

vLLM batching: 50-60% utilization improvement
KV cache optimization: 30% memory reduction
Paged attention: 20% efficiency gain
Speculative decoding: 2.5x token generation speedup

These stack multiplicatively. Combined improvements enabled 99% cost reduction.

Scale Economics

Major providers achieved scale allowing cost reduction:

OpenAI: 100M+ users by 2025 Google: Gemini integrated into 2B+ Android devices Anthropic: 10M+ API users

At scale, fixed infrastructure costs amortized across volume. This funded research into efficiency innovations.

Future Projections

2026-2027 Outlook

Continued price pressure expected but at declining rates:

Current trajectory suggests:

Premium models (GPT-4-equivalent): $0.005-$0.01 per 1K input tokens
Mid-tier models: $0.0002-$0.001 per 1K input tokens
Commodity models: $0.00001-$0.0001 per 1K input tokens

Pricing floor unlikely to drop below marginal cost of $0.00001 per token (electricity + infrastructure).

Alternative Models

Expect shift from per-token pricing to alternative models:

Flat-rate subscriptions ($50-500/month for unlimited usage)
Seat-based licensing (teams/enterprises)
Revenue sharing (keep 70% of value created)
Usage commitments (reserved capacity)

Per-token pricing may become minority model for high-scale users.

Self-Hosted Renaissance

As open-source model quality converges with commercial models, self-hosting becomes increasingly attractive:

Home/small business: GPU costs drop below $100 (RTX 4060 @ $399 becomes affordable) Enterprise: Internal models save 85-95% on API costs

Expect 30-40% migration to self-hosted for companies by 2028.

FAQ

Q: Why did LLM prices drop 99%? Hardware costs declined 60%, model efficiency doubled, competition increased 10x, and inference stack matured. Combined effects multiplied to 99% price reduction. Most of decline attributable to efficiency, not margin compression.

Q: Is pricing at floor now? No. Current prices still 100-1000x marginal cost. Expect continued 20-30% annual declines through 2027 as efficiency continues. Price floor around $0.00001/token (electricity cost).

Q: Which providers were hurt by price wars? Smaller providers (Replicate, Anthropic early versions) either adopted lower pricing or exited market. OpenAI maintained leadership through capability, not pricing. Margin compression hurt providers without efficiency innovations.

Q: Will API pricing converge to same level? Partially. Commodity models (Llama, Mistral) will converge. Premium models (GPT-4, Claude 3 Opus) maintain higher pricing justified by capability. Expect 2-3 distinct pricing tiers: commodity ($0.0001), mid-tier ($0.001), premium ($0.01).

Q: Should I build on cheaper providers or wait for prices to drop further? Build on current infrastructure. Price volatility has stabilized. 2026 pricing reflects mature market. Waiting for further drops unlikely to yield meaningful savings. Cost savings better achieved through architectural optimization (caching, quantization, self-hosting).

Q: How did open-source models affect pricing? Open-source created pricing floor at marginal cost. Enabled new providers without R&D burden. Forced commercial providers to compete on cost and efficiency rather than capability alone. Accelerated price declines by 2-3 years.

Sources

OpenAI Pricing Archive (Wayback Machine, 2023-2026)
Anthropic Pricing History
Google Cloud Pricing Archives
Together AI Pricing Documentation
Industry analyst reports (McKinsey AI 2026)
Historical GitHub pricing discussions
Web3 Foundation pricing analysis

Contents