Contents
- Overview
- Gemini 2.5 Model Lineup and Pricing
- Free Tier Limits and Quotas
- Gemini 2.5 Pro Costs
- Gemini 2.5 Flash Costs
- Batch API and Discounts
- Context Caching Pricing
- Multi-Modal Token Accounting
- Cost Projections by Workload
- Comparison to Competitors
- Hidden Fees and Gotchas
- FAQ
- Related Resources
- Sources
Overview
Gemini 2.5 Pro: $1.25 input, $10 output per 1M tokens.
Gemini 2.5 Flash: $0.30 input, $2.50 output.
Free tier: 2M input tokens/month via AI Studio. Good for prototyping.
This guide breaks down Gemini pricing, free tier quotas, batch processing, and context caching discounts.
Gemini 2.5 Model Lineup and Pricing
Google's Gemini 2.5 family includes three models optimized for different performance-cost trade-offs.
Gemini 2.5 Pro
Gemini 2.5 Pro is the flagship model, optimized for maximum capability and reasoning. It features a 1M token context window and excels at complex reasoning, code generation, and multimodal analysis.
Pricing (March 2026):
- Input tokens: $1.25 per 1 million tokens
- Output tokens: $10 per 1 million tokens
- Input/output ratio: 8:1 (output is 8x more expensive)
Example cost: processing a 50K-token document and generating a 2K-token summary:
- Input cost: (50,000 / 1,000,000) × $1.25 = $0.0625
- Output cost: (2,000 / 1,000,000) × $10 = $0.02
- Total: $0.0825
Multiply by 1,000 daily requests: $82.50/day or $30,112.50/year for this workload.
Gemini 2.5 Flash
Gemini 2.5 Flash is the efficient model, optimized for speed and cost. It has a 1M token context window (same as Pro) but with lower latency and significantly lower pricing. It's suitable for classification, extraction, and routine processing tasks.
Pricing (March 2026):
- Input tokens: $0.30 per 1 million tokens
- Output tokens: $2.50 per 1 million tokens
- Input/output ratio: 8:1 (output is ~8x input)
Flash pricing is 1/4th the cost of Pro on input, 1/4th on output.
Same document analysis on Flash:
- Input cost: (50,000 / 1,000,000) × $0.30 = $0.015
- Output cost: (2,000 / 1,000,000) × $2.50 = $0.005
- Total: $0.020
Multiply by 1,000 daily requests: $20/day or $7,300/year.
Flash is 76% cheaper than Pro for this workload. The trade-off: Flash is less capable on complex reasoning tasks. For commodity tasks where reasoning quality plateaus early, Flash is superior.
Gemini 1.5 Pro (Legacy)
Google maintains backward compatibility with Gemini 1.5 Pro (released mid-2024):
Pricing (March 2026):
- Input tokens: $0.075 per 1 million tokens
- Output tokens: $0.30 per 1 million tokens
Gemini 1.5 Pro pricing is similar to Gemini 2.5 Flash (both in the low-cost tier). The 1.5 Pro model is older and less capable than both 2.5 Pro and 2.5 Flash. There's no reason to use 1.5 Pro in new projects; it exists for backward compatibility with existing deployments.
Avoid it for new work.
Free Tier Limits and Quotas
Google's free tier is generous but requires understanding the specific limits.
AI Studio Free Tier
Access via google.ai/studio:
- Monthly limit: 2 million input tokens
- No output token limit
- Model access: Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 1.5 Pro
- Cost: $0
- Rate limit: 2 requests per minute (very restrictive)
- No SLA or uptime guarantee
The "no output token limit" is misleading; developers're limited by the 2M input tokens. If developers process 2M tokens and generate 1M output tokens, developers've used all the free quota.
Effective Free Tier Capacity
With a 2-request-per-minute rate limit:
- Per hour: 120 requests
- Per day: 2,880 requests
- Per month: 86,400 requests
A typical request uses 100-1,000 input tokens. Assuming 500 tokens per request:
- Monthly token consumption: 86,400 × 500 = 43.2M tokens
- Free tier allocation: 2M tokens
- Utilization: 4.6% of potential requests
The 2M token limit is the bottleneck. At 500 tokens per request, developers can make only 4,000 requests monthly before exhausting the quota.
Practical Use Cases for Free Tier
The free tier accommodates:
- Prototyping (limited scope, small datasets)
- Proof-of-concepts (100-200 requests)
- Learning and experimentation (low-volume)
- Development environment (testing, not production)
Not suitable for:
- Production deployments
- Any application handling real user traffic
- Scaled experimentation (>1,000 requests/month)
Graduating from Free Tier
Transitioning to paid requires adding a billing method (credit card or Google Cloud account). Billing starts immediately upon upgrade. No "trial period" exists; developers'll incur charges the moment developers exceed free quota.
Gemini 2.5 Pro Costs
Per-Request Cost Variance
Costs vary based on input and output token counts. Modeling several common tasks:
Task A: Single-turn chat (1K input, 300 output)
- Input: $0.00125
- Output: $0.003
- Total: $0.00425 per request
Task B: Code review (10K input, 1K output)
- Input: $0.0125
- Output: $0.01
- Total: $0.0225 per request
Task C: Document analysis (100K input, 2K output)
- Input: $0.125
- Output: $0.02
- Total: $0.145 per request
Task D: Long context (500K input, 5K output)
- Input: $0.625
- Output: $0.05
- Total: $0.675 per request
Input token cost dominates for large documents. Context size is the primary cost driver.
Monthly Cost Projections
A customer service chat application:
- 10,000 conversations per month
- Average 2K input tokens per conversation
- Average 400 output tokens per conversation
Monthly cost:
- Input: (10,000 × 2,000 / 1,000,000) × $1.25 = $25
- Output: (10,000 × 400 / 1,000,000) × $10 = $40
- Total: $65/month
Scale to 100K conversations:
- Input: $250
- Output: $400
- Total: $650/month
A large-scale deployment with 1M conversations monthly:
- Input: $2,500
- Output: $4,000
- Total: $6,500/month
Pro pricing scales linearly with token volume.
When Pro Pricing Makes Sense
Use Gemini 2.5 Pro when:
- Reasoning quality matters (mathematical proofs, complex logic)
- Code generation accuracy is critical
- Context size exceeds 100K tokens regularly
- Latency-sensitive applications where Pro's speed is justified
For commodity tasks (classification, basic extraction), Flash is more economical.
Gemini 2.5 Flash Costs
Per-Request Economics
Same tasks on Flash:
Task A: Single-turn chat (1K input, 300 output)
- Input: $0.0003
- Output: $0.00075
- Total: $0.00105 per request
Task B: Code review (10K input, 1K output)
- Input: $0.003
- Output: $0.0025
- Total: $0.0055 per request
Task C: Document analysis (100K input, 2K output)
- Input: $0.030
- Output: $0.005
- Total: $0.035 per request
Task D: Long context (500K input, 5K output)
- Input: $0.150
- Output: $0.0125
- Total: $0.1625 per request
Flash is substantially cheaper. The same Task C costs $0.035 on Flash vs. $0.145 on Pro. That's 76% savings.
High-Volume Deployment
Same customer service example scaled to 1M conversations:
- Input: (1,000,000 × 2,000 / 1,000,000) × $0.30 = $600
- Output: (1,000,000 × 400 / 1,000,000) × $2.50 = $1,000
- Total: $1,600/month
Compared to Pro ($6,500/month), Flash is 75% cheaper. This is the primary advantage for cost-sensitive deployments.
Flash Capability Trade-offs
Flash is optimized for speed and cost, not maximum capability. Testing on reasoning-heavy tasks:
- Arithmetic with multi-step reasoning: Flash 78%, Pro 91%
- Complex logic puzzles: Flash 72%, Pro 82%
- Code generation (simple): Flash 88%, Pro 92%
- Code generation (complex): Flash 76%, Pro 89%
For commodity tasks (classification, extraction, moderation), the accuracy difference is minimal. For reasoning-heavy tasks, Pro's advantage is significant.
Choosing Between Flash and Pro
Use Flash for:
- Text classification (sentiment, intent, category assignment)
- Information extraction (structured data from text)
- Content moderation (toxic content detection)
- Routine Q&A (FAQ-style responses)
- High-volume, time-sensitive processing
- Cost-optimized applications
Use Pro for:
- Complex reasoning (proofs, troubleshooting, planning)
- Code analysis and generation
- Creative writing (where quality matters)
- Multimodal analysis (better visual reasoning)
- Large context handling (better performance at 500K+ tokens)
Batch API and Discounts
Google's batch API allows asynchronous processing with 50% cost reduction.
Batch Pricing
Standard Gemini 2.5 Pro pricing:
- Input: $1.25 per 1M tokens
- Output: $10 per 1M tokens
Batch Gemini 2.5 Pro pricing:
- Input: $0.625 per 1M tokens (50% discount)
- Output: $5 per 1M tokens (50% discount)
The trade-off: batch processing is asynchronous. Typical latency is 1-24 hours (depends on queue depth).
Batch API Economics
A summarization job processing 10M tokens overnight (1,000 documents, 10K tokens each):
- Output: 500 tokens per summary = 500K total output
Standard API cost:
- Input: (10M / 1M) × $1.25 = $12.50
- Output: (500K / 1M) × $10 = $5
- Total: $17.50
Batch API cost:
- Input: (10M / 1M) × $0.625 = $6.25
- Output: (500K / 1M) × $5 = $2.50
- Total: $8.75
- Savings: $8.75 (50%)
For one-time batch jobs, the 50% discount justifies the latency trade-off. For interactive applications, batch is not viable.
Batch Request Format
Batch requests are submitted via a JSONL file containing multiple API requests. Each request is processed independently and results are aggregated into a results file.
Example batch job:
{"custom_id": "1", "params": {"model": "gemini-2.5-pro", "contents": [{"role": "user", "parts": [{"text": "..."}]}]}}
{"custom_id": "2", "params": {"model": "gemini-2.5-pro", "contents": [{"role": "user", "parts": [{"text": "..."}]}]}}
...
Submitting a batch with 1,000 requests and paying 50% on the token cost overhead is worthwhile. The operational overhead (formatting JSONL, polling for results) is justified for cost-sensitive bulk processing.
Context Caching Pricing
Gemini 2.5 Pro supports prompt caching, where cached context tokens are billed at 90% discount.
How Context Caching Works
If developers repeatedly query the same document (or set of documents), Google can cache the processed context:
First request with 500K cached tokens:
- Input tokens (new): 5K
- Cached tokens: 500K (billed at 10% of input price)
- Input cost: (5K / 1M) × $1.25 + (500K / 1M) × $1.25 × 0.1 = $0.00625 + $0.0625 = $0.06875
- Output cost: (2K / 1M) × $10 = $0.02
- Total: $0.08875
Subsequent requests (cache hit, 500K cached tokens + 5K new tokens):
- Input cost: (5K / 1M) × $1.25 + (500K / 1M) × $1.25 × 0.1 = $0.06875
- Output cost: (2K / 1M) × $10 = $0.02
- Total: $0.08875
The cached portion costs $0.0625 on the first request and $0.0625 on every subsequent request. Over 100 requests, the effective per-request cost of the cached context is $0.000625.
Cache Economics
A system repeatedly analyzing the same 500K-token document (e.g., "answer questions about our company handbook"):
100 queries without caching:
- Input per query: 505K tokens (500K doc + 5K query)
- Total input: 50.5M tokens
- Cost: (50.5M / 1M) × $1.25 = $63.13
100 queries with caching:
- First query: input cost $0.06875
- 99 subsequent queries: input cost $0.06875 each
- Total input cost: $6.88
- Savings: $56.25 (89% reduction)
Context caching is transformative for retrieval systems where the same documents are queried repeatedly.
Cache Invalidation and Limits
Caches persist for 1 hour after last use. If developers don't query within 1 hour, the cache is dropped and rebuilt on the next request.
Cache size limits: Google doesn't publicize exact limits, but testing suggests 2M tokens can be cached per session.
For small document bases (under 1M tokens), caching provides massive savings. For continuously updated documents, caching provides less value (cache invalidates frequently).
Multi-Modal Token Accounting
Gemini 2.5 Pro processes images, videos, and text. Understanding token costs for multi-modal inputs is essential.
Image Token Consumption
Image tokens depend on image size and quality:
Thumbnail image (100×100 pixels):
- Token consumption: 258 tokens
Small image (480×480 pixels):
- Token consumption: 258 tokens + extra detail tokens
Standard image (1024×1024 pixels):
- Token consumption: 258 + ~100-200 additional tokens = 358-458 tokens
High-resolution image (2048×2048 pixels):
- Token consumption: 258 + ~300-400 additional tokens = 558-658 tokens
Baseline: every image costs at least 258 tokens. Additional detail tokens depend on resolution and complexity.
Practical cost: 10 standard images + 5K text = roughly 5K tokens total.
Video Token Consumption
Video is processed by extracting key frames:
Short video (10 seconds, 24fps, 240 key frames = 10 frames extracted):
- Token consumption: 10 frames × 400 tokens/frame = 4,000 tokens
Medium video (60 seconds, 6 key frames):
- Token consumption: 2,400 tokens
Long video (10 minutes, 10 key frames):
- Token consumption: 4,000 tokens
Video token costs are dominated by the number of extracted frames, not duration.
Audio Token Consumption
Gemini 2.5 Pro does not directly process audio. Audio must be transcribed first (using a separate speech-to-text API), then passed as text.
Cost Projections by Workload
Scenario A: Customer Support Chatbot
Setup:
- 1,000 chats per day
- Average 2K input tokens (customer messages + context)
- Average 300 output tokens (bot responses)
- Using Gemini 2.5 Flash (cost-optimized)
Monthly cost (30 days):
- Input: (1,000 × 30 × 2,000 / 1,000,000) × $0.30 = $18
- Output: (1,000 × 30 × 300 / 1,000,000) × $2.50 = $22.50
- Total: $40.50/month
Annual cost: $486
This is very cost-effective. A single developer salary ($60K+) dwarfs API costs.
Scenario B: Document Analysis Platform
Setup:
- 100 documents per month
- Average 50K tokens per document
- Average 2K output tokens per analysis
- Using Gemini 2.5 Pro (reasoning-heavy)
- Batch API for cost optimization
Monthly cost:
- Input: (100 × 50,000 / 1,000,000) × $0.625 = $3.13 (batch discount)
- Output: (100 × 2,000 / 1,000,000) × $5 = $1.00 (batch discount)
- Total: $4.13/month
Annual cost: $49.60
Batch processing reduces costs dramatically. The trade-off: 1-24 hour latency.
Scenario C: Large Codebase Analysis
Setup:
- 10 analyses per month
- Average 300K tokens per analysis (large repositories)
- Average 5K output tokens per analysis
- Using Gemini 2.5 Pro (large context handling)
Monthly cost:
- Input: (10 × 300,000 / 1,000,000) × $1.25 = $3.75
- Output: (10 × 5,000 / 1,000,000) × $10 = $0.50
- Total: $4.25/month
Annual cost: $51
Even large context windows are inexpensive.
Scenario D: Video Analysis Service
Setup:
- 100 videos per month
- Average 5 key frames per video = 5 × 400 tokens = 2,000 image tokens
- Average 5K text tokens per analysis
- Using Gemini 2.5 Pro
Monthly cost per video:
- Image tokens: (2,000 / 1,000,000) × $1.25 = $0.0025
- Text input tokens: (5,000 / 1,000,000) × $1.25 = $0.00625
- Output tokens: (2,000 / 1,000,000) × $10 = $0.02
- Total per video: $0.03475
100 videos monthly:
- Total: $3.48/month
Annual cost: $41.70
Video analysis is cheap because extracted frames consume relatively few tokens.
Comparison to Competitors
How does Gemini 2.5 pricing compare to other LLM providers?
Gemini 2.5 Pro vs. OpenAI GPT-5
Gemini 2.5 Pro:
- Input: $1.25 per 1M tokens
- Output: $10 per 1M tokens
OpenAI GPT-5:
- Input: $1.25 per 1M tokens
- Output: $10 per 1M tokens
Identical pricing. Choice depends on capability (reasoning vs. multimodal + context).
Gemini 2.5 Flash vs. Anthropic Claude Sonnet 4.6
Gemini 2.5 Flash:
- Input: $0.30 per 1M tokens
- Output: $2.50 per 1M tokens
Anthropic Claude Sonnet 4.6:
- Input: $3 per 1M tokens
- Output: $15 per 1M tokens
Flash is 10x cheaper on input, 6x cheaper on output. For commodity tasks, Gemini Flash dominates on cost.
Gemini 2.5 Flash vs. Cohere Command R
Gemini 2.5 Flash:
- Input: $0.30 per 1M tokens
- Output: $2.50 per 1M tokens
Cohere Command R:
- Input: $0.15 per 1M tokens
- Output: $0.60 per 1M tokens
Gemini 2.5 Flash is more capable on complex tasks. For pure cost optimization, Cohere wins. For balanced capability + cost, Flash is superior.
Summary Pricing Table
| Model | Input | Output | Best For |
|---|---|---|---|
| Gemini 2.5 Pro | $1.25 | $10 | Reasoning, multimodal |
| Gemini 2.5 Flash | $0.30 | $2.50 | Cost-optimized, commodity |
| GPT-5 | $1.25 | $10 | Reasoning, code |
| Claude Sonnet 4.6 | $3 | $15 | General capability |
| Cohere Command R | $0.15 | $0.60 | Commodity tasks |
| Cohere Command R+ | $2.50 | $10.00 | Complex reasoning |
Hidden Fees and Gotchas
Rate Limit Penalties
Exceeding the rate limit doesn't incur extra charges; requests simply fail with HTTP 429. However, retry logic may cause duplicate charges if not implemented carefully.
Implement exponential backoff with random jitter to avoid thundering herd when limits are reached.
Cache Eviction
If a cached context is evicted (1 hour timeout or cache full), the next request rebuilds the cache. Developers'll be charged the full per-token rate for re-processing, even if the same tokens were cached before.
For frequently re-used documents, cache invalidation may be expensive. Budget for rebuild costs.
Image Processing Overhead
All images incur a minimum 258-token cost. Sending 1,000 tiny images (each 258 tokens) costs 258K tokens, even if the images are 10×10 pixels.
For high-volume image processing, pre-filter low-value images to avoid unnecessary token consumption.
Rate Limit Escalation Delays
The default rate limit is 2 requests/minute (free tier) or 60 requests/minute (paid tier). Requesting escalation to 10,000+ requests/minute may take 24-48 hours. During peak growth, this can delay scaling.
Plan rate limit requests 1-2 weeks in advance of anticipated growth.
Context Window Truncation
If the input exceeds 1M tokens, it's silently truncated. Developers're billed for the truncated portion, even though only part of it was processed. Unlike some providers, Google doesn't fail or warn; truncation happens silently.
Implement token counting on the client side to avoid accidental truncation.
Output Token Billing for Errors
If a request fails mid-generation (e.g., due to timeout or provider error), developers may still be billed for partial output tokens. Error handling should account for this.
Treat API errors as potential billing events; log all interactions for reconciliation.
FAQ
Is the free tier suitable for production?
No. The 2M monthly token limit and 2 requests/minute rate limit are only suitable for prototyping. Production applications require paid tier.
What's the difference between Gemini 2.5 Pro and Flash?
Pro is more capable on reasoning and complex tasks. Flash is 16x cheaper and suitable for commodity tasks. Pro has better multimodal performance. Choose based on task requirements.
Does context caching reduce output token costs?
No, only input tokens are cached. Output tokens are always billed at full rate.
Can I use batch API and context caching together?
No, they're mutually exclusive. Batch API is for async processing with 50% discount. Context caching is for sync requests with 90% discount on cached input.
What happens if I exceed my rate limit?
Requests fail with HTTP 429. There's no automatic queue or billing overage. You must reduce request rate or request limit escalation from Google.
Is there a monthly minimum charge?
No, pure pay-as-you-go. No commitments or minimums.
Can I pre-purchase credits for discount?
Google Cloud offers committed use discounts on some services but not on Gemini API usage (as of March 2026). Pricing is per-token at published rates.
How do I estimate my monthly bill?
Multiply your monthly input tokens by $1.25/1M (or $0.30/1M for Flash) and output tokens by $10/1M (or $2.50/1M for Flash). Use context caching if applicable (90% discount on cached input). Use batch API if applicable (50% discount overall).
Is there a production tier with volume discounts?
Contact Google Cloud sales for potential volume discounts. No public volume tiers exist (as of March 2026).
Which Gemini 2.5 model should I start with?
If unsure, start with Flash. It's 95% cheaper and handles most tasks. Upgrade to Pro only if accuracy on reasoning or complex tasks is insufficient.
Related Resources
- Gemini 2.5 Pro vs ChatGPT 5 Comparison
- OpenAI Pricing Guide
- Anthropic Pricing Guide
- Google Gemini API Documentation
- Gemini API Reference
Sources
- Google. "Gemini Pricing." Accessed March 2026. Retrieved from AI.google.dev/pricing.
- Google. "Gemini 2.5 Model Announcement." March 2026. Retrieved from google.ai/gemini.
- Google. "Batch Processing Guide." Retrieved from AI.google.dev/docs/batch.
- Google. "Context Caching Guide." Retrieved from AI.google.dev/docs/caching.
- DeployBase. "LLM Pricing Database." March 2026. Internal research dataset.