Gemini API Pricing 2026: All Tiers & Free Limits

Gemini API Pricing 2026: Overview
Gemini API Tiers
Gemini 2.5 Pro Pricing
Gemini 2.5 Flash Pricing
Free Tier & Rate Limits
Context-Dependent Pricing
Image & Audio Handling Costs
Comparison to OpenAI & Anthropic
Cost Optimization Tips
Real-World Cost Examples
FAQ
Related Resources
Sources

Gemini API Pricing 2026: Overview

Gemini API Pricing 2026 is the focus of this guide. Gemini 2.5 Pro: $1.25 input per 1M tokens for prompts up to 200K tokens; $2.50 input over 200K. Output: $10 per 1M tokens under 200K context, $15 over.

Gemini 2.5 Flash: $0.30 input, $2.50 output.

Free tier: 1M tokens/day via AI Studio.

Context caching discounts tokens by 90% on reused context. Pick Flash for speed, Pro for reasoning.

Gemini API Tiers

Google offers Gemini through multiple access paths:

Free Tier (Google AI Studio)

Access: google.ai/studio

Models:

Gemini 2.5 Flash
Gemini 1.5 Flash
Gemini 1.5 Pro (limited)

Rate limits:

60 requests/minute (RPM)
1M tokens/day (TTD)
Sliding window: No burst protection

Use cases: Prototyping, small-scale experimentation, education.

Cost: Free (no credit card required).

Paid Tier (Google Cloud APIs)

Access: Google Cloud Console, Vertex AI API

Models:

Gemini 2.5 Pro (latest)
Gemini 2.5 Flash (latest)
Gemini 1.5 Pro, Flash (legacy)

Rate limits: Per-project quotas (customizable)

Pricing: Per 1M input tokens and per 1M output tokens

Cost: Pay-as-you-go, no minimum, can use free trial credits ($300 for new accounts).

Gemini 2.5 Pro Pricing

Gemini 2.5 Pro is Google's flagship reasoning-focused model, competing directly with GPT-4.1 and Claude Opus.

Token Pricing (as of March 2026)

Metric	Cost (≤200K context)	Cost (>200K context)
Input tokens	$1.25 per 1M	$2.50 per 1M
Output tokens	$10.00 per 1M	$15.00 per 1M
Images (small)	$2.50 per 100 images	$2.50 per 100 images
Images (large)	$7.50 per 100 images	$7.50 per 100 images
Audio input	$0.0006 per minute	$0.0006 per minute

Example: 1,000 input tokens + 500 output tokens (under 200K context):

Input cost: (1,000 / 1,000,000) × $1.25 = $0.00125
Output cost: (500 / 1,000,000) × $10.00 = $0.005
Total: $0.00625

Example: 100k input + 10k output (medium-sized request, under 200K context):

Input: (100,000 / 1,000,000) × $1.25 = $0.125
Output: (10,000 / 1,000,000) × $10.00 = $0.10
Total: $0.225

Context Window

Gemini 2.5 Pro supports 1,000,000 tokens (1M context window).

Implication for pricing: Large document processing is feasible. A 1M-token policy document costs $1.25 in input tokens to process.

Batch Processing (Coming 2026)

Google announced batch processing for Gemini Pro (lower cost for non-urgent queries). Pricing not yet finalized, but expected:

50% discount for batch jobs submitted with <24-hour SLA
No guaranteed latency

Gemini 2.5 Flash Pricing

Gemini 2.5 Flash is Google's efficiency-focused model, optimized for speed and cost. Similar capability to Claude Sonnet 4.6, significantly faster than Pro.

Token Pricing (as of March 2026)

Metric	Cost
Input tokens	$0.30 per 1M
Output tokens	$2.50 per 1M
Images (small)	$1.00 per 100 images
Images (large)	$2.50 per 100 images
Audio input	$0.00006 per minute (10x cheaper than Pro)

Cost comparison for 100k input + 10k output:

Input: (100,000 / 1,000,000) × $0.30 = $0.03
Output: (10,000 / 1,000,000) × $2.50 = $0.025
Total: $0.055

vs. Gemini Pro for same tokens:

Pro total: $0.225
Flash total: $0.055
Savings: 76%

Speed & Performance

Gemini 2.5 Flash trades some reasoning capability for speed:

First token latency: 100-200ms (vs. Pro's 500ms)
Throughput: 10k tokens/sec (vs. Pro's 2k)
Suitable for: Chatbots, real-time inference, summarization
Not suitable for: Complex multi-step reasoning, competitive exams, novel problem-solving

Free Tier & Rate Limits

Google AI Studio (Gemini 2.5 Flash)

Daily limits:

1M tokens per day
60 requests per minute (RPM)
10 requests per second (RPS)
Sliding window enforcement (tokens reset every 24 hours UTC)

Practical impact:

A single 1M-token request exhausts daily limit
Moderate use cases (10-100 requests/day) stay within limits
Scale-out to production requires paid tier

No billing required: Google AI Studio is free, no credit card needed.

Paid Tier Free Trial

New Google Cloud accounts receive:

$300 free credits (expires after 90 days)
Full API access to all Gemini models
Same rate limits as paid tier (customizable)

Cost calculation:

$300 ÷ $0.30 per 1M input (Flash) = 1B input tokens free
Typical use: 2-3 months for small-to-medium applications

Context-Dependent Pricing

Gemini pricing has nuances related to context window and cache behavior.

Context Cache (Upcoming)

Google announced "Prompt Cache" for Gemini API (March 2026 beta). Mechanism:

Store frequently-accessed long contexts (docs, codebase) in cache
Reuse cached context across multiple API calls
Cache hit costs: 10% of input token price
Cache miss costs: 100% of input token price
5M token cache per project

Example:

Upload 100k-token codebase (100k tokens @ $0.30 per 1M = $0.03)
First call: Pay 100% for context = $0.03
Next 1,000 calls reuse cache: Each pays 10% = $0.003 per call
Savings over 1,000 calls: $3 vs. $30 = 90% reduction

Context cache pricing is still experimental; volumes may adjust during 2026.

Dynamic Token Pricing

Gemini pricing is fixed per token (unlike some competitors with variable rates). No surge pricing, no demand-based adjustments.

This simplifies budgeting but removes upside if token compression improves.

Image & Audio Handling Costs

Gemini 2.5 Flash and Pro support multimodal inputs. Pricing varies by content type.

Image Pricing

Gemini 2.5 Flash:

Small images (<= 256×256px): $1.00 per 100 images
Large images (> 256×256px): $2.50 per 100 images

Gemini 2.5 Pro:

Small images: $2.50 per 100 images
Large images: $7.50 per 100 images

Example: 100 large images through Gemini 2.5 Flash = $2.50 total. Plus any text tokens in the request.

Comparison to OpenAI:

GPT-4V: $0.0025 per image (variable, low-res) or $0.0075 (high-res)
Gemini Flash: $0.025 per image (large) = comparable

Audio Input Pricing

Gemini 2.5 Flash:

$0.00006 per minute of audio

Gemini 2.5 Pro:

$0.0006 per minute of audio

Example: 60-minute audio = 60 × $0.00006 = $0.0036 (Flash) or $0.036 (Pro)

Audio pricing is extremely cheap, making Gemini suitable for transcription + summarization workflows.

Video Input

Gemini APIs do not charge separately for video; videos are processed as sequences of images. Frame-by-frame costs apply (image pricing).

Workaround: Extract key frames, then charge per frame only.

Comparison to OpenAI & Anthropic

Input Token Pricing

Model	Input Cost
Gemini 2.5 Flash	$0.30 per 1M
OpenAI GPT-5 Mini	$0.25 per 1M
Claude Sonnet 4.6	$3.00 per 1M
OpenAI GPT-4.1	$2.00 per 1M
Gemini 2.5 Pro	$1.25 per 1M
Claude Opus 4.6	$5.00 per 1M
OpenAI GPT-5.4	$2.50 per 1M

GPT-5 Mini is cheapest on input tokens at $0.25/1M; Gemini 2.5 Flash is $0.30/1M. OpenAI's newer models are cheaper than older Claude versions.

Output Token Pricing

Model	Output Cost
Gemini 2.5 Flash	$2.50 per 1M
OpenAI GPT-5 Mini	$2.00 per 1M
Claude Sonnet 4.6	$15.00 per 1M
OpenAI GPT-4.1	$8.00 per 1M
Gemini 2.5 Pro	$10.00 per 1M
Claude Opus 4.6	$25.00 per 1M
OpenAI GPT-5.4	$15.00 per 1M

Claude Opus is most expensive; GPT-5 Mini is cheapest on output.

Cost-Effectiveness for Common Tasks

Chatbot (short responses, high volume):

Gemini 2.5 Flash: $0.30 + $2.50 = $2.80 per 1M tokens (blended)
Winner: OpenAI GPT-5 Mini ($0.25 + $2.00 = $2.25 blended) for raw cost; Gemini Flash competitive

Document Summarization (long inputs, medium outputs):

Gemini 2.5 Flash: Input-heavy, very cheap
Winner: Gemini 2.5 Flash (context cache provides additional 90% savings on reused docs)

Code Generation (long outputs):

OpenAI GPT-5: $1.25 + $10 = $11.25 per 1M tokens (blended)
Gemini 2.5 Pro: $1.25 + $10 = $11.25 per 1M tokens (blended, under 200K context)
Winner: Comparable; Gemini 2.5 Pro has 1M context advantage

Complex Reasoning (competitive exams, novel problems):

Claude Opus 4.6: $5 + $25 = $30 per 1M tokens (blended)
OpenAI o3: $2 + $8 = $10 per 1M tokens (blended)
Winner: OpenAI o3 (specialized reasoning)

Cost Optimization Tips

1. Choose the Right Model Tier

Use Gemini 2.5 Flash if:

High volume of requests (chatbots, support automation)
Output length is moderate
Reasoning complexity is low-to-moderate
Budget is primary constraint

Use Gemini 2.5 Pro if:

Output quality is non-negotiable
Longer reasoning required
Complex multi-step problems
Acceptable if budget is higher

2. Use Context Cache

Store frequently-accessed documents (policies, code, docs) in Prompt Cache. Reuse across 1,000+ API calls to achieve 90% savings on context tokens.

Implementation:

Identify stable, reused contexts (company handbook, codebase)
Load once into cache
Append query tokens for each request

3. Batch Processing (Coming 2026)

Submit non-urgent requests (analysis, reports) to Gemini batch API for 50% discount. Trade latency (up to 24 hours) for cost savings.

4. Compress Input Tokens

Use prompt compression techniques:

Remove redundant instructions
Use examples instead of lengthy explanations
Summarize long documents before sending to API

Example: 100k-token document summary = 10k tokens to API, preserving 90% of information. Cost reduction: 90%.

5. Filter Outputs

Request only necessary data:

JSON-structured responses (remove verbose explanations)
Bullet points instead of paragraphs
Summaries instead of full text

Example: "Return 3 bullet points" vs. "Write an essay" can reduce output tokens by 50-80%.

6. Use Streaming APIs

Gemini API supports streaming responses. Calculate cost as you stream, stop early if sufficient data received.

Benefit: Stop after receiving 1,000 tokens instead of waiting for a full 5,000-token response.

Real-World Cost Examples

Example 1: Customer Support Chatbot

Workload: 10,000 customer conversations per month. Average 200 input tokens, 150 output tokens per conversation.

Using Gemini 2.5 Flash:

Input: (10,000 × 200) / 1,000,000 × $0.30 = $0.60
Output: (10,000 × 150) / 1,000,000 × $2.50 = $3.75
Monthly cost: $4.35

Comparison:

OpenAI GPT-5 Mini: (10,000 × 200 × $0.25 + 10,000 × 150 × $2.00) / 1M = $3.50/month
Claude Sonnet 4.6: (10,000 × 200 × $3 + 10,000 × 150 × $15) / 1M = $28.50/month

Winner: GPT-5 Mini at $3.50/month is marginally cheaper; Gemini 2.5 Flash at $4.35/month is also very cost-effective. Both are far cheaper than Claude Sonnet.

Example 2: Legal Document Summarization

Workload: Summarize 100 contracts per month. Average 200k tokens per contract (input), 5k tokens per summary (output).

Using Gemini 2.5 Flash with Context Cache:

First contract: 200k × $0.30 + 5k × $2.50 = $60 + $12.50 = $72.50
Cache setup cost: ~$30 (one-time)
Subsequent contracts (reusing structure, cache at 10%): 200k × $0.03 + 5k × $2.50 = $6 + $12.50 = $18.50 each
Monthly cost: $72.50 + 99 × $18.50 = $1,904

Without cache:

Monthly cost: 100 × $72.50 = $7,250

Savings from context cache: 74%

Example 3: Real-Time Coding Assistance

Workload: 1,000 developer sessions per month. Average 50k token codebase (context), 500 input tokens (question), 1,000 output tokens (code suggestion).

Using Gemini 2.5 Flash with Context Cache:

Codebase loaded once: 50k × $0.03 (cache hit rate, 10% of $0.30/M) = $1.50
Per request: 500 × $0.30 + 1,000 × $2.50 = $0.15 + $2.50 = $2.65 (including 50k cached tokens at cache rate)
Per request more precisely: (500/1M × $0.30) + (50k/1M × $0.03) + (1000/1M × $2.50) = $0.00015 + $0.0015 + $0.0025 = $0.00415
Monthly: $1.50 + 1,000 × $0.00415 = $5.65

Using Gemini 2.5 Pro (without cache optimization):

Per request: 50k × $1.25 + 500 × $1.25 + 1,000 × $10 = $62.50 + $0.625 + $10 = $73.125
Monthly: 1,000 × $73.125 = $73,125

Cost difference: Flash with cache (~$6) vs. Pro without cache ($73,125) = ~12,000x cheaper

Example 4: Batch Data Analysis

Workload: Analyze 10,000 CSV rows per month. Average 2k input tokens per row, 500 output tokens per row. Non-urgent (24-hour SLA acceptable).

Using Gemini batch processing (estimated 50% discount):

Input: (10,000 × 2,000) / 1M × $0.30 × 0.5 = $3.00
Output: (10,000 × 500) / 1M × $2.50 × 0.5 = $6.25
Monthly: $9.25

Using on-demand pricing:

Monthly: $18.50

Savings from batch processing: 50%

FAQ

Q: Does Gemini API have a free tier? Yes. Google AI Studio (AI.google.com) offers free access to Gemini 2.5 Flash with limits: 1M tokens/day, 60 RPM. Suitable for exploration, not production.

Q: What's the difference between Gemini and Gemini Pro? Gemini 2.5 is the latest version. Gemini 1.5 is older. Pro and Flash are size variants; Pro is larger and more capable, Flash is faster and cheaper.

Q: Can I use cached contexts indefinitely? No. Prompt Cache expires after 5 minutes of inactivity per cache. Reusing within 5 minutes costs 10% of normal price. After 5 minutes, reload context (costs 100%).

Q: Does output token count include the user's prompt? No. Output tokens are only the model's response. Input tokens include your prompt and all context.

Q: Is there a monthly bill minimum? No. Google Cloud billing has no minimum. If you use $0.50 in a month, you're charged $0.50 (after free credits).

Q: How does Gemini compare to Claude for coding tasks? Gemini 2.5 Flash is competitive for simple coding (bug fixes, boilerplate). Claude Opus is better for complex refactoring and architecture design. For cost, Gemini 2.5 Flash wins (100x cheaper). For quality, Claude Opus wins.

Q: Does Gemini API work in my region? Gemini API is available in 150+ countries. Check Google Cloud regional availability. No geo-restrictions on API access itself.

Q: Can I use Gemini API for fine-tuning? Not yet. Google does not offer fine-tuning on Gemini API as of March 2026. Use base models only.

Q: How do I estimate my monthly bill? Estimate input tokens × input cost per 1M + output tokens × output cost per 1M. Multiply by expected volume per month. Use Google Cloud cost calculator for accuracy.

OpenAI Pricing Guide 2026
Anthropic Claude Pricing Guide
DeepSeek API Pricing
LLM Cost Comparison Matrix
Prompt Optimization for Cost Savings
Context Caching Best Practices

Sources

Google AI Studio. google.ai/studio/ (March 2026)
Google Cloud Vertex AI Pricing. cloud.google.com/vertex-ai/pricing (March 2026)
Gemini API Documentation. AI.google.dev/docs (March 2026)
Gemini 2.5 Announcement. blog.google/technology/ai/google-gemini-2-5/ (December 2024)
Google Cloud Prompt Caching. cloud.google.com/docs/generative-ai/caching (Beta, March 2026)
OpenAI Pricing. openai.com/api/pricing (March 2026)
Anthropic Pricing. anthropic.com/pricing (March 2026)

Contents

Gemini API Pricing 2026: Overview

Gemini API Tiers

Free Tier (Google AI Studio)

Paid Tier (Google Cloud APIs)

Gemini 2.5 Pro Pricing

Token Pricing (as of March 2026)

Context Window

Batch Processing (Coming 2026)

Gemini 2.5 Flash Pricing

Token Pricing (as of March 2026)

Speed & Performance

Free Tier & Rate Limits

Google AI Studio (Gemini 2.5 Flash)

Paid Tier Free Trial

Context-Dependent Pricing

Context Cache (Upcoming)

Dynamic Token Pricing

Image & Audio Handling Costs

Image Pricing

Audio Input Pricing

Video Input

Comparison to OpenAI & Anthropic

Input Token Pricing

Output Token Pricing

Cost-Effectiveness for Common Tasks

Cost Optimization Tips

1. Choose the Right Model Tier

2. Use Context Cache

3. Batch Processing (Coming 2026)

4. Compress Input Tokens

5. Filter Outputs

6. Use Streaming APIs

Real-World Cost Examples

Example 1: Customer Support Chatbot

Example 2: Legal Document Summarization

Example 3: Real-Time Coding Assistance

Example 4: Batch Data Analysis

FAQ

Related Resources

Sources