Contents
- Gemini API Pricing 2026: Overview
- Gemini API Tiers
- Gemini 2.5 Pro Pricing
- Gemini 2.5 Flash Pricing
- Free Tier & Rate Limits
- Context-Dependent Pricing
- Image & Audio Handling Costs
- Comparison to OpenAI & Anthropic
- Cost Optimization Tips
- Real-World Cost Examples
- FAQ
- Related Resources
- Sources
Gemini API Pricing 2026: Overview
Gemini API Pricing 2026 is the focus of this guide. Gemini 2.5 Pro: $1.25 input per 1M tokens for prompts up to 200K tokens; $2.50 input over 200K. Output: $10 per 1M tokens under 200K context, $15 over.
Gemini 2.5 Flash: $0.30 input, $2.50 output.
Free tier: 1M tokens/day via AI Studio.
Context caching discounts tokens by 90% on reused context. Pick Flash for speed, Pro for reasoning.
Gemini API Tiers
Google offers Gemini through multiple access paths:
Free Tier (Google AI Studio)
Access: google.ai/studio
Models:
- Gemini 2.5 Flash
- Gemini 1.5 Flash
- Gemini 1.5 Pro (limited)
Rate limits:
- 60 requests/minute (RPM)
- 1M tokens/day (TTD)
- Sliding window: No burst protection
Use cases: Prototyping, small-scale experimentation, education.
Cost: Free (no credit card required).
Paid Tier (Google Cloud APIs)
Access: Google Cloud Console, Vertex AI API
Models:
- Gemini 2.5 Pro (latest)
- Gemini 2.5 Flash (latest)
- Gemini 1.5 Pro, Flash (legacy)
Rate limits: Per-project quotas (customizable)
Pricing: Per 1M input tokens and per 1M output tokens
Cost: Pay-as-developers-go, no minimum, can use free trial credits ($300 for new accounts).
Gemini 2.5 Pro Pricing
Gemini 2.5 Pro is Google's flagship reasoning-focused model, competing directly with GPT-4.1 and Claude Opus.
Token Pricing (as of March 2026)
| Metric | Cost (≤200K context) | Cost (>200K context) |
|---|---|---|
| Input tokens | $1.25 per 1M | $2.50 per 1M |
| Output tokens | $10.00 per 1M | $15.00 per 1M |
| Images (small) | $2.50 per 100 images | $2.50 per 100 images |
| Images (large) | $7.50 per 100 images | $7.50 per 100 images |
| Audio input | $0.0006 per minute | $0.0006 per minute |
Example: 1,000 input tokens + 500 output tokens (under 200K context):
- Input cost: (1,000 / 1,000,000) × $1.25 = $0.00125
- Output cost: (500 / 1,000,000) × $10.00 = $0.005
- Total: $0.00625
Example: 100k input + 10k output (medium-sized request, under 200K context):
- Input: (100,000 / 1,000,000) × $1.25 = $0.125
- Output: (10,000 / 1,000,000) × $10.00 = $0.10
- Total: $0.225
Context Window
Gemini 2.5 Pro supports 1,000,000 tokens (1M context window).
Implication for pricing: Large document processing is feasible. A 1M-token policy document costs $1.25 in input tokens to process.
Batch Processing (Coming 2026)
Google announced batch processing for Gemini Pro (lower cost for non-urgent queries). Pricing not yet finalized, but expected:
- 50% discount for batch jobs submitted with <24-hour SLA
- No guaranteed latency
Gemini 2.5 Flash Pricing
Gemini 2.5 Flash is Google's efficiency-focused model, optimized for speed and cost. Similar capability to Claude Sonnet 4.6, significantly faster than Pro.
Token Pricing (as of March 2026)
| Metric | Cost |
|---|---|
| Input tokens | $0.30 per 1M |
| Output tokens | $2.50 per 1M |
| Images (small) | $1.00 per 100 images |
| Images (large) | $2.50 per 100 images |
| Audio input | $0.00006 per minute (10x cheaper than Pro) |
Cost comparison for 100k input + 10k output:
- Input: (100,000 / 1,000,000) × $0.30 = $0.03
- Output: (10,000 / 1,000,000) × $2.50 = $0.025
- Total: $0.055
vs. Gemini Pro for same tokens:
- Pro total: $0.225
- Flash total: $0.055
- Savings: 76%
Speed & Performance
Gemini 2.5 Flash trades some reasoning capability for speed:
- First token latency: 100-200ms (vs. Pro's 500ms)
- Throughput: 10k tokens/sec (vs. Pro's 2k)
- Suitable for: Chatbots, real-time inference, summarization
- Not suitable for: Complex multi-step reasoning, competitive exams, novel problem-solving
Free Tier & Rate Limits
Google AI Studio (Gemini 2.5 Flash)
Daily limits:
- 1M tokens per day
- 60 requests per minute (RPM)
- 10 requests per second (RPS)
- Sliding window enforcement (tokens reset every 24 hours UTC)
Practical impact:
- A single 1M-token request exhausts daily limit
- Moderate use cases (10-100 requests/day) stay within limits
- Scale-out to production requires paid tier
No billing required: Google AI Studio is free, no credit card needed.
Paid Tier Free Trial
New Google Cloud accounts receive:
- $300 free credits (expires after 90 days)
- Full API access to all Gemini models
- Same rate limits as paid tier (customizable)
Cost calculation:
- $300 ÷ $0.30 per 1M input (Flash) = 1B input tokens free
- Typical use: 2-3 months for small-to-medium applications
Context-Dependent Pricing
Gemini pricing has nuances related to context window and cache behavior.
Context Cache (Upcoming)
Google announced "Prompt Cache" for Gemini API (March 2026 beta). Mechanism:
- Store frequently-accessed long contexts (docs, codebase) in cache
- Reuse cached context across multiple API calls
- Cache hit costs: 10% of input token price
- Cache miss costs: 100% of input token price
- 5M token cache per project
Example:
- Upload 100k-token codebase (100k tokens @ $0.30 per 1M = $0.03)
- First call: Pay 100% for context = $0.03
- Next 1,000 calls reuse cache: Each pays 10% = $0.003 per call
- Savings over 1,000 calls: $3 vs. $30 = 90% reduction
Context cache pricing is still experimental; volumes may adjust during 2026.
Dynamic Token Pricing
Gemini pricing is fixed per token (unlike some competitors with variable rates). No surge pricing, no demand-based adjustments.
This simplifies budgeting but removes upside if token compression improves.
Image & Audio Handling Costs
Gemini 2.5 Flash and Pro support multimodal inputs. Pricing varies by content type.
Image Pricing
Gemini 2.5 Flash:
- Small images (<= 256×256px): $1.00 per 100 images
- Large images (> 256×256px): $2.50 per 100 images
Gemini 2.5 Pro:
- Small images: $2.50 per 100 images
- Large images: $7.50 per 100 images
Example: 100 large images through Gemini 2.5 Flash = $2.50 total. Plus any text tokens in the request.
Comparison to OpenAI:
- GPT-4V: $0.0025 per image (variable, low-res) or $0.0075 (high-res)
- Gemini Flash: $0.025 per image (large) = comparable
Audio Input Pricing
Gemini 2.5 Flash:
- $0.00006 per minute of audio
Gemini 2.5 Pro:
- $0.0006 per minute of audio
Example: 60-minute audio = 60 × $0.00006 = $0.0036 (Flash) or $0.036 (Pro)
Audio pricing is extremely cheap, making Gemini suitable for transcription + summarization workflows.
Video Input
Gemini APIs do not charge separately for video; videos are processed as sequences of images. Frame-by-frame costs apply (image pricing).
Workaround: Extract key frames, then charge per frame only.
Comparison to OpenAI & Anthropic
Input Token Pricing
| Model | Input Cost |
|---|---|
| Gemini 2.5 Flash | $0.30 per 1M |
| OpenAI GPT-5 Mini | $0.25 per 1M |
| Claude Sonnet 4.6 | $3.00 per 1M |
| OpenAI GPT-4.1 | $2.00 per 1M |
| Gemini 2.5 Pro | $1.25 per 1M |
| Claude Opus 4.6 | $5.00 per 1M |
| OpenAI GPT-5.4 | $2.50 per 1M |
GPT-5 Mini is cheapest on input tokens at $0.25/1M; Gemini 2.5 Flash is $0.30/1M. OpenAI's newer models are cheaper than older Claude versions.
Output Token Pricing
| Model | Output Cost |
|---|---|
| Gemini 2.5 Flash | $2.50 per 1M |
| OpenAI GPT-5 Mini | $2.00 per 1M |
| Claude Sonnet 4.6 | $15.00 per 1M |
| OpenAI GPT-4.1 | $8.00 per 1M |
| Gemini 2.5 Pro | $10.00 per 1M |
| Claude Opus 4.6 | $25.00 per 1M |
| OpenAI GPT-5.4 | $15.00 per 1M |
Claude Opus is most expensive; GPT-5 Mini is cheapest on output.
Cost-Effectiveness for Common Tasks
Chatbot (short responses, high volume):
- Gemini 2.5 Flash: $0.30 + $2.50 = $2.80 per 1M tokens (blended)
- Winner: OpenAI GPT-5 Mini ($0.25 + $2.00 = $2.25 blended) for raw cost; Gemini Flash competitive
Document Summarization (long inputs, medium outputs):
- Gemini 2.5 Flash: Input-heavy, very cheap
- Winner: Gemini 2.5 Flash (context cache provides additional 90% savings on reused docs)
Code Generation (long outputs):
- OpenAI GPT-5: $1.25 + $10 = $11.25 per 1M tokens (blended)
- Gemini 2.5 Pro: $1.25 + $10 = $11.25 per 1M tokens (blended, under 200K context)
- Winner: Comparable; Gemini 2.5 Pro has 1M context advantage
Complex Reasoning (competitive exams, novel problems):
- Claude Opus 4.6: $5 + $25 = $30 per 1M tokens (blended)
- OpenAI o3: $2 + $8 = $10 per 1M tokens (blended)
- Winner: OpenAI o3 (specialized reasoning)
Cost Optimization Tips
1. Choose the Right Model Tier
Use Gemini 2.5 Flash if:
- High volume of requests (chatbots, support automation)
- Output length is moderate
- Reasoning complexity is low-to-moderate
- Budget is primary constraint
Use Gemini 2.5 Pro if:
- Output quality is non-negotiable
- Longer reasoning required
- Complex multi-step problems
- Acceptable if budget is higher
2. Use Context Cache
Store frequently-accessed documents (policies, code, docs) in Prompt Cache. Reuse across 1,000+ API calls to achieve 90% savings on context tokens.
Implementation:
- Identify stable, reused contexts (company handbook, codebase)
- Load once into cache
- Append query tokens for each request
3. Batch Processing (Coming 2026)
Submit non-urgent requests (analysis, reports) to Gemini batch API for 50% discount. Trade latency (up to 24 hours) for cost savings.
4. Compress Input Tokens
Use prompt compression techniques:
- Remove redundant instructions
- Use examples instead of lengthy explanations
- Summarize long documents before sending to API
Example: 100k-token document summary = 10k tokens to API, preserving 90% of information. Cost reduction: 90%.
5. Filter Outputs
Request only necessary data:
- JSON-structured responses (remove verbose explanations)
- Bullet points instead of paragraphs
- Summaries instead of full text
Example: "Return 3 bullet points" vs. "Write an essay" can reduce output tokens by 50-80%.
6. Use Streaming APIs
Gemini API supports streaming responses. Calculate cost as developers stream, stop early if sufficient data received.
Benefit: Stop after receiving 1,000 tokens instead of waiting for full 5,000-token response.
Real-World Cost Examples
Example 1: Customer Support Chatbot
Workload: 10,000 customer conversations per month. Average 200 input tokens, 150 output tokens per conversation.
Using Gemini 2.5 Flash:
- Input: (10,000 × 200) / 1,000,000 × $0.30 = $0.60
- Output: (10,000 × 150) / 1,000,000 × $2.50 = $3.75
- Monthly cost: $4.35
Comparison:
- OpenAI GPT-5 Mini: (10,000 × 200 × $0.25 + 10,000 × 150 × $2.00) / 1M = $3.50/month
- Claude Sonnet 4.6: (10,000 × 200 × $3 + 10,000 × 150 × $15) / 1M = $28.50/month
Winner: GPT-5 Mini at $3.50/month is marginally cheaper; Gemini 2.5 Flash at $4.35/month is also very cost-effective. Both are far cheaper than Claude Sonnet.
Example 2: Legal Document Summarization
Workload: Summarize 100 contracts per month. Average 200k tokens per contract (input), 5k tokens per summary (output).
Using Gemini 2.5 Flash with Context Cache:
- First contract: 200k × $0.30 + 5k × $2.50 = $60 + $12.50 = $72.50
- Cache setup cost: ~$30 (one-time)
- Subsequent contracts (reusing structure, cache at 10%): 200k × $0.03 + 5k × $2.50 = $6 + $12.50 = $18.50 each
- Monthly cost: $72.50 + 99 × $18.50 = $1,904
Without cache:
- Monthly cost: 100 × $72.50 = $7,250
Savings from context cache: 74%
Example 3: Real-Time Coding Assistance
Workload: 1,000 developer sessions per month. Average 50k token codebase (context), 500 input tokens (question), 1,000 output tokens (code suggestion).
Using Gemini 2.5 Flash with Context Cache:
- Codebase loaded once: 50k × $0.03 (cache hit rate, 10% of $0.30/M) = $1.50
- Per request: 500 × $0.30 + 1,000 × $2.50 = $0.15 + $2.50 = $2.65 (including 50k cached tokens at cache rate)
- Per request more precisely: (500/1M × $0.30) + (50k/1M × $0.03) + (1000/1M × $2.50) = $0.00015 + $0.0015 + $0.0025 = $0.00415
- Monthly: $1.50 + 1,000 × $0.00415 = $5.65
Using Gemini 2.5 Pro (without cache optimization):
- Per request: 50k × $1.25 + 500 × $1.25 + 1,000 × $10 = $62.50 + $0.625 + $10 = $73.125
- Monthly: 1,000 × $73.125 = $73,125
Cost difference: Flash with cache (~$6) vs. Pro without cache ($73,125) = ~12,000x cheaper
Example 4: Batch Data Analysis
Workload: Analyze 10,000 CSV rows per month. Average 2k input tokens per row, 500 output tokens per row. Non-urgent (24-hour SLA acceptable).
Using Gemini batch processing (estimated 50% discount):
- Input: (10,000 × 2,000) / 1M × $0.30 × 0.5 = $3.00
- Output: (10,000 × 500) / 1M × $2.50 × 0.5 = $6.25
- Monthly: $9.25
Using on-demand pricing:
- Monthly: $18.50
Savings from batch processing: 50%
FAQ
Q: Does Gemini API have a free tier? Yes. Google AI Studio (AI.google.com) offers free access to Gemini 2.5 Flash with limits: 1M tokens/day, 60 RPM. Suitable for exploration, not production.
Q: What's the difference between Gemini and Gemini Pro? Gemini 2.5 is the latest version. Gemini 1.5 is older. Pro and Flash are size variants; Pro is larger and more capable, Flash is faster and cheaper.
Q: Can I use cached contexts indefinitely? No. Prompt Cache expires after 5 minutes of inactivity per cache. Reusing within 5 minutes costs 10% of normal price. After 5 minutes, reload context (costs 100%).
Q: Does output token count include the user's prompt? No. Output tokens are only the model's response. Input tokens include your prompt and all context.
Q: Is there a monthly bill minimum? No. Google Cloud billing has no minimum. If you use $0.50 in a month, you're charged $0.50 (after free credits).
Q: How does Gemini compare to Claude for coding tasks? Gemini 2.5 Flash is competitive for simple coding (bug fixes, boilerplate). Claude Opus is better for complex refactoring and architecture design. For cost, Gemini 2.5 Flash wins (100x cheaper). For quality, Claude Opus wins.
Q: Does Gemini API work in my region? Gemini API is available in 150+ countries. Check Google Cloud regional availability. No geo-restrictions on API access itself.
Q: Can I use Gemini API for fine-tuning? Not yet. Google does not offer fine-tuning on Gemini API as of March 2026. Use base models only.
Q: How do I estimate my monthly bill? Estimate input tokens × input cost per 1M + output tokens × output cost per 1M. Multiply by expected volume per month. Use Google Cloud cost calculator for accuracy.
Related Resources
- OpenAI Pricing Guide 2026
- Anthropic Claude Pricing Guide
- DeepSeek API Pricing
- LLM Cost Comparison Matrix
- Prompt Optimization for Cost Savings
- Context Caching Best Practices
Sources
- Google AI Studio. google.ai/studio/ (March 2026)
- Google Cloud Vertex AI Pricing. cloud.google.com/vertex-ai/pricing (March 2026)
- Gemini API Documentation. AI.google.dev/docs (March 2026)
- Gemini 2.5 Announcement. blog.google/technology/ai/google-gemini-2-5/ (December 2024)
- Google Cloud Prompt Caching. cloud.google.com/docs/generative-ai/caching (Beta, March 2026)
- OpenAI Pricing. openai.com/api/pricing (March 2026)
- Anthropic Pricing. anthropic.com/pricing (March 2026)