LLM API Price Tracker: Weekly Update (Template)

Deploybase · July 1, 2025 · LLM Pricing

Contents

LLM API Price Tracker: OpenAI API Pricing

LLM API Price Tracker is the focus of this guide. OpenAI maintains tiered pricing for GPT-4, GPT-4 Turbo, and GPT-3.5-turbo. As of March 2026, rates remain stable with minor updates quarterly.

Current rates:

  • GPT-5: $0.00125/1K input tokens, $0.01/1K output tokens
  • GPT-4.1: $0.002/1K input, $0.008/1K output
  • GPT-4o: $0.0025/1K input tokens, $0.01/1K output tokens
  • GPT-4o mini: $0.00015/1K input, $0.0006/1K output

Volume discounts apply at $100/month+ spending. OpenAI tier users receive 15-20% reductions. production contracts negotiate custom rates.

Fine-tuning costs differ from inference:

  • GPT-3.5-turbo fine-tuning: $0.008/1K tokens training
  • GPT-4 fine-tuning: $0.06/1K tokens training
  • Output from fine-tuned models: 1.5x base inference cost

Token pricing favors output reduction. Summarization and filtering before API calls optimize spend.

Claude API Pricing

Anthropic's Claude API pricing through Anthropic directly differs from cloud provider hosted versions.

Direct API rates:

  • Claude Opus 4.6: $0.005/1K input, $0.025/1K output
  • Claude Sonnet 4.6: $0.003/1K input, $0.015/1K output
  • Claude Haiku 4.5: $0.001/1K input, $0.005/1K output
  • Claude Haiku 3: $0.00025/1K input, $0.00125/1K output

Batch API (lower priority, 24-hour processing):

  • 50% discount on input tokens
  • Standard output pricing

Volume discounts:

  • $1000/month: 10% discount
  • $10,000/month: 20% discount
  • $100,000/month: Custom pricing

Context windows affect pricing on some models. Extended 200K-token windows cost 1.5x at some providers.

Google Gemini Pricing

Google Gemini API pricing launched in 2024 and continues evolution. Rates vary by model capability.

Current pricing:

  • Gemini 2.5 Pro: $1.25/M input, $10.00/M output
  • Gemini 2.5 Flash: $0.30/M input, $2.50/M output
  • Gemini 2.0 Flash: $0.10/M input, $0.40/M output
  • Embedding models: $0.02/M tokens

Free tier allowance:

  • 15 requests/minute
  • 50,000 requests/day
  • 1M requests/month limit

Cached content discounts:

  • First 5M tokens: Standard rate
  • Subsequent 1M tokens: 90% discount (cached reads)

Multi-modal pricing applies per image or video. Images cost $0.10 each, video costs $1 per minute.

Open Source Model Hosting

Self-hosted models on GPU clouds remove API dependencies. Typical costs:

RunPod L4 instance:

  • GPU: $0.44/hour
  • Storage: $0.01/GB/month
  • Monthly cost (730 hours): $321.20 + storage
  • 7B model serving: ~100 tokens/sec

Inference optimization:

  • vLLM reduces overhead by 30-40%
  • Quantization (4-bit) cuts memory 75%
  • Batching improves throughput 5-10x

Cost per 1M tokens comparison:

  • OpenAI GPT-5: $1.25 input / $10.00 output
  • Claude Opus 4.6: $5.00 input / $25.00 output
  • Self-hosted Llama 3 70B: $0.50-2.00

Self-hosting makes sense above 100M tokens monthly usage.

Weekly Price Changes

March 15-22, 2026:

OpenAI maintained GPT-4o pricing. No input/output rate adjustments announced.

Claude pricing stable. Anthropic held rates constant from previous quarter.

Google increased Gemini context caching effectiveness by 5%, enhancing cost advantages for long documents.

Azure LLM deployments saw regional pricing adjustments (South Africa, Singapore data centers now online).

RunPod remained at $0.44 L4, $0.79 L40S throughout the week. No spot instance availability changes reported.

Lambda Labs announced H100 availability expansion (+15% capacity) with no rate changes.

Vast.AI market pricing fluctuated $0.05-0.08/hour for H100 throughout the week based on supply.

FAQ

Which API offers the best value for production inference? GPT-3.5-turbo and Claude Haiku dominate cost-per-token comparisons. Self-hosting on RunPod wins at scale (100M+ tokens/month).

Do any providers offer monthly subscriptions instead of per-token billing? OpenAI does not. Anthropic does not. Google Gemini does not. Self-hosting on GPU clouds provides fixed-cost models.

How do volume discounts work across providers? OpenAI tier based. Claude usage based. Google Gemini tier based. Verify current terms with each provider.

Can I cache API responses to reduce costs? Yes. Some providers (Google Gemini) offer prompt caching. Others require application-level caching (Redis, database).

What is the cost difference between streaming and non-streaming API calls? No difference. Streaming responses cost identical per-token rates to batch requests.

Sources

  • OpenAI API Pricing Documentation
  • Anthropic Claude API Documentation
  • Google AI Studio & Gemini API Pricing
  • RunPod Pricing Documentation
  • Lambda Labs Pricing Page