Cost Per Token Over Time: How LLM API Pricing Has Dropped

Deploybase · March 4, 2026 · Market Analysis

Contents

Historical Pricing Timeline

Cost per Token Trends is the focus of this guide. LLM API pricing has decreased dramatically since GPT-3's launch in 2020. Understanding this trajectory informs cost projections for businesses evaluating AI investments.

2020-2021: GPT-3 API pricing started at $0.006 per 1,000 prompt tokens and $0.012 per 1,000 completion tokens. Teams considered API calls expensive compared to local inference.

2022: Pricing remained relatively stable through early 2022. Major cloud providers launched GPU infrastructure, creating direct competition with API vendors.

2023: First significant price reductions appeared. OpenAI introduced GPT-3.5 at $0.0015 per prompt token, approximately 75% lower than GPT-3. Meta released Llama 2, spurring open-source adoption.

2024: Cost-per-token continued declining. Anthropic's Claude models decreased 20% mid-year. Google released Gemini with competitive pricing.

2025-2026: Current pricing reflects commoditization of LLM inference. GPT-4o costs $2.50 per 1M prompt tokens, while newer frontier models like GPT-5 and Gemini 2.5 Pro have pushed input pricing to $1.25/M — an 80%+ reduction from early 2023 GPT-4 rates.

Absolute Price Decreases

OpenAI's pricing evolution demonstrates the trend:

  • GPT-3 (2020): $60.00 per 1M prompt tokens (equivalent; billed at $0.06/1K)
  • GPT-3.5-Turbo (2023): $1.50 per 1M prompt tokens
  • GPT-4 Turbo (early 2024): $10.00 per 1M prompt tokens
  • GPT-4o (late 2024): $5.00 per 1M prompt tokens
  • GPT-4o (March 2026): $2.50 per 1M prompt tokens
  • GPT-5 (March 2026): $1.25 per 1M prompt tokens

This represents a ~98% price reduction for comparable capability over six years.

Competitive Market Impact

Pricing pressure increased substantially with open-source model availability. Llama 3 availability through API providers like Together AI and Replicate forced proprietary vendors to reduce costs.

As of March 2026, API pricing for equivalent model capabilities shows:

  • Frontier proprietary models (GPT-5, Claude Sonnet, Gemini 2.5 Pro): $1.25–$3.00/M input tokens
  • Mid-tier proprietary models (GPT-4o, Gemini Flash): $0.30–$2.50/M input tokens
  • Open-source through commercial APIs (DeepSeek V3, Mistral, Llama): $0.14–$0.55/M input tokens
  • Self-hosted inference: $0.001–$0.01/M input tokens equivalent via compute costs

The price-to-capability ratio has shifted dramatically in favor of end users.

Volume-Based Discounts

All major providers implemented aggressive volume discounts:

OpenAI: 10% discount at 100M monthly tokens, 20% at 1B monthly tokens

Anthropic: 15% discount at 500M monthly tokens

Google Cloud: 20% discount at 1B monthly tokens through commitments

High-volume users consuming over 1B tokens monthly benefit most from competitive pricing.

Provider Competition Impact

Multiple factors drove pricing reductions:

Open-Source Competition: Llama, Mistral, and other models eliminated artificial scarcity. Proprietary vendors could no longer command premium pricing.

Cloud Provider Entry: AWS Bedrock, Google Cloud Vertex AI, and Azure OpenAI Service added pricing pressure through volume-based discounts.

Inference Optimization: Quantization, distillation, and knowledge distillation techniques reduced computational requirements, lowering operating costs for providers.

Capital Availability: Competitive funding rounds encouraged aggressive pricing to capture market share.

Current Pricing (March 2026)

Model-specific pricing across major providers as of March 2026:

GPT-4o: $2.50 per 1M prompt tokens, $10.00 per 1M completion tokens

GPT-5: $1.25 per 1M prompt tokens, $10.00 per 1M completion tokens

Claude Sonnet 4.6: $3.00 per 1M prompt tokens, $15.00 per 1M completion tokens

Gemini 2.5 Pro: $1.25 per 1M prompt tokens, $10.00 per 1M completion tokens

DeepSeek V3: $0.27 per 1M prompt tokens, $1.10 per 1M completion tokens

Llama 3.1 (via API, e.g. Together AI): $0.18 per 1M prompt tokens, $0.90 per 1M completion tokens

Self-hosted inference through cloud GPUs offers lower absolute costs for high-volume users. Reference LLM API pricing for comprehensive provider comparison and current rates.

Future Projections

Pricing will likely decrease 30-50% over the next 2-3 years based on current trajectories:

Hardware Innovation: New GPU architectures offering 2-3x throughput improvements reduce per-token serving costs.

Model Efficiency: Mixture-of-Experts architectures and sparse computation reduce computational intensity.

Market Saturation: Commoditization of LLM inference drives margin compression across providers.

Teams implementing long-term contracts now secure favorable rates before further declines emerge.

FAQ

Why has LLM API pricing dropped so dramatically? Increased competition from open-source models, cloud provider entry, and inference optimization techniques all contributed. Model capabilities now outpace pricing reductions, meaning tokens deliver more value despite lower costs.

Should we lock in pricing contracts now? If consuming over 500M tokens monthly, securing 12-24 month commitments ensures price stability. Smaller users benefit from monitoring quarterly pricing changes rather than long commitments.

Are self-hosted models cheaper than API calls? For workloads consuming over 1-2B tokens monthly, self-hosted inference becomes cost-competitive. For sporadic usage, APIs remain more economical due to no infrastructure overhead.

Do different models have different cost trajectories? Yes. Open-source model pricing decreased faster than proprietary offerings. Proprietary vendors maintain 2-5x price premium for latest capabilities.

What's the cost difference between prompt and completion tokens? Completion tokens typically cost 4–5x prompt tokens due to the sequential autoregressive generation process. GPT-4o charges $2.50/$10 (4x), Claude Sonnet charges $3/$15 (5x). Earlier models (2022–2023) had a lower 2x ratio, but the gap has widened as providers optimized prompt processing.

Review comprehensive pricing information in LLM API pricing guide. Explore cost optimization with inference optimization techniques. Learn about self-hosted alternatives and their economics using AI cost calculator.

Understand GPU pricing fundamentals through spot GPU pricing and GPU cloud cost comparison to evaluate self-hosting economics.

Sources