LLM API Price Tracker: Weekly Update (Template)

LLM API Price Tracker: OpenAI API Pricing
Claude API Pricing
Google Gemini Pricing
Open Source Model Hosting
Weekly Price Changes
FAQ
Related Resources
Sources

LLM API Price Tracker: OpenAI API Pricing

LLM API Price Tracker is the focus of this guide. OpenAI maintains tiered pricing for GPT-4, GPT-4 Turbo, and GPT-3.5-turbo. As of March 2026, rates remain stable with minor updates quarterly.

Current rates:

GPT-5: $0.00125/1K input tokens, $0.01/1K output tokens
GPT-4.1: $0.002/1K input, $0.008/1K output
GPT-4o: $0.0025/1K input tokens, $0.01/1K output tokens
GPT-4o mini: $0.00015/1K input, $0.0006/1K output

Volume discounts apply at $100/month+ spending. OpenAI tier users receive 15-20% reductions. production contracts negotiate custom rates.

Fine-tuning costs differ from inference:

GPT-3.5-turbo fine-tuning: $0.008/1K tokens training
GPT-4 fine-tuning: $0.06/1K tokens training
Output from fine-tuned models: 1.5x base inference cost

Token pricing favors output reduction. Summarization and filtering before API calls optimize spend.

Claude API Pricing

Anthropic's Claude API pricing through Anthropic directly differs from cloud provider hosted versions.

Direct API rates:

Claude Opus 4.6: $0.005/1K input, $0.025/1K output
Claude Sonnet 4.6: $0.003/1K input, $0.015/1K output
Claude Haiku 4.5: $0.001/1K input, $0.005/1K output
Claude Haiku 3: $0.00025/1K input, $0.00125/1K output

Batch API (lower priority, 24-hour processing):

50% discount on input tokens
Standard output pricing

Volume discounts:

$1000/month: 10% discount
$10,000/month: 20% discount
$100,000/month: Custom pricing

Context windows affect pricing on some models. Extended 200K-token windows cost 1.5x at some providers.

Google Gemini Pricing

Google Gemini API pricing launched in 2024 and continues evolution. Rates vary by model capability.

Current pricing:

Gemini 2.5 Pro: $1.25/M input, $10.00/M output
Gemini 2.5 Flash: $0.30/M input, $2.50/M output
Gemini 2.0 Flash: $0.10/M input, $0.40/M output
Embedding models: $0.02/M tokens

Free tier allowance:

15 requests/minute
50,000 requests/day
1M requests/month limit

Cached content discounts:

First 5M tokens: Standard rate
Subsequent 1M tokens: 90% discount (cached reads)

Multi-modal pricing applies per image or video. Images cost $0.10 each, video costs $1 per minute.

Open Source Model Hosting

Self-hosted models on GPU clouds remove API dependencies. Typical costs:

RunPod L4 instance:

GPU: $0.44/hour
Storage: $0.01/GB/month
Monthly cost (730 hours): $321.20 + storage
7B model serving: ~100 tokens/sec

Inference optimization:

vLLM reduces overhead by 30-40%
Quantization (4-bit) cuts memory 75%
Batching improves throughput 5-10x

Cost per 1M tokens comparison:

OpenAI GPT-5: $1.25 input / $10.00 output
Claude Opus 4.6: $5.00 input / $25.00 output
Self-hosted Llama 3 70B: $0.50-2.00

Self-hosting makes sense above 100M tokens monthly usage.

Weekly Price Changes

March 15-22, 2026:

OpenAI maintained GPT-4o pricing. No input/output rate adjustments announced.

Claude pricing stable. Anthropic held rates constant from previous quarter.

Google increased Gemini context caching effectiveness by 5%, enhancing cost advantages for long documents.

Azure LLM deployments saw regional pricing adjustments (South Africa, Singapore data centers now online).

RunPod remained at $0.44 L4, $0.79 L40S throughout the week. No spot instance availability changes reported.

Lambda Labs announced H100 availability expansion (+15% capacity) with no rate changes.

Vast.AI market pricing fluctuated $0.05-0.08/hour for H100 throughout the week based on supply.

FAQ

Which API offers the best value for production inference? GPT-3.5-turbo and Claude Haiku dominate cost-per-token comparisons. Self-hosting on RunPod wins at scale (100M+ tokens/month).

Do any providers offer monthly subscriptions instead of per-token billing? OpenAI does not. Anthropic does not. Google Gemini does not. Self-hosting on GPU clouds provides fixed-cost models.

How do volume discounts work across providers? OpenAI tier based. Claude usage based. Google Gemini tier based. Verify current terms with each provider.

Can I cache API responses to reduce costs? Yes. Some providers (Google Gemini) offer prompt caching. Others require application-level caching (Redis, database).

What is the cost difference between streaming and non-streaming API calls? No difference. Streaming responses cost identical per-token rates to batch requests.

Sources

OpenAI API Pricing Documentation
Anthropic Claude API Documentation
Google AI Studio & Gemini API Pricing
RunPod Pricing Documentation
Lambda Labs Pricing Page

Contents