Google Gemini 2.5 Pricing: API Costs & Free Tier Guide

Overview
Gemini 2.5 Model Lineup and Pricing
Free Tier Limits and Quotas
Gemini 2.5 Pro Costs
Gemini 2.5 Flash Costs
Batch API and Discounts
Context Caching Pricing
Multi-Modal Token Accounting
Cost Projections by Workload
Comparison to Competitors
Hidden Fees and Gotchas
FAQ
Related Resources
Sources

Overview

Gemini 2.5 Pro: $1.25 input, $10 output per 1M tokens.

Gemini 2.5 Flash: $0.30 input, $2.50 output.

Free tier: 2M input tokens/month via AI Studio. Good for prototyping.

This guide breaks down Gemini pricing, free tier quotas, batch processing, and context caching discounts.

Gemini 2.5 Model Lineup and Pricing

Google's Gemini 2.5 family includes three models optimized for different performance-cost trade-offs.

Gemini 2.5 Pro

Gemini 2.5 Pro is the flagship model, optimized for maximum capability and reasoning. It features a 1M token context window and excels at complex reasoning, code generation, and multimodal analysis.

Pricing (March 2026):

Input tokens: $1.25 per 1 million tokens
Output tokens: $10 per 1 million tokens
Input/output ratio: 8:1 (output is 8x more expensive)

Example cost: processing a 50K-token document and generating a 2K-token summary:

Input cost: (50,000 / 1,000,000) × $1.25 = $0.0625
Output cost: (2,000 / 1,000,000) × $10 = $0.02
Total: $0.0825

Multiply by 1,000 daily requests: $82.50/day or $30,112.50/year for this workload.

Gemini 2.5 Flash

Gemini 2.5 Flash is the efficient model, optimized for speed and cost. It has a 1M token context window (same as Pro) but with lower latency and significantly lower pricing. It's suitable for classification, extraction, and routine processing tasks.

Pricing (March 2026):

Input tokens: $0.30 per 1 million tokens
Output tokens: $2.50 per 1 million tokens
Input/output ratio: 8:1 (output is ~8x input)

Flash pricing is 1/4th the cost of Pro on input, 1/4th on output.

Same document analysis on Flash:

Input cost: (50,000 / 1,000,000) × $0.30 = $0.015
Output cost: (2,000 / 1,000,000) × $2.50 = $0.005
Total: $0.020

Multiply by 1,000 daily requests: $20/day or $7,300/year.

Flash is 76% cheaper than Pro for this workload. The trade-off: Flash is less capable on complex reasoning tasks. For commodity tasks where reasoning quality plateaus early, Flash is superior.

Gemini 1.5 Pro (Legacy)

Google maintains backward compatibility with Gemini 1.5 Pro (released mid-2024):

Pricing (March 2026):

Input tokens: $0.075 per 1 million tokens
Output tokens: $0.30 per 1 million tokens

Gemini 1.5 Pro pricing is similar to Gemini 2.5 Flash (both in the low-cost tier). The 1.5 Pro model is older and less capable than both 2.5 Pro and 2.5 Flash. There's no reason to use 1.5 Pro in new projects; it exists for backward compatibility with existing deployments.

Avoid it for new work.

Free Tier Limits and Quotas

Google's free tier is generous but requires understanding the specific limits.

AI Studio Free Tier

Access via google.ai/studio:

Monthly limit: 2 million input tokens
No output token limit
Model access: Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 1.5 Pro
Cost: $0
Rate limit: 2 requests per minute (very restrictive)
No SLA or uptime guarantee

The "no output token limit" is misleading; you're limited by the 2M input tokens. If you process 2M tokens and generate 1M output tokens, you've used all the free quota.

Effective Free Tier Capacity

With a 2-request-per-minute rate limit:

Per hour: 120 requests
Per day: 2,880 requests
Per month: 86,400 requests

A typical request uses 100-1,000 input tokens. Assuming 500 tokens per request:

Monthly token consumption: 86,400 × 500 = 43.2M tokens
Free tier allocation: 2M tokens
Utilization: 4.6% of potential requests

The 2M token limit is the bottleneck. At 500 tokens per request, developers can make only 4,000 requests monthly before exhausting the quota.

Practical Use Cases for Free Tier

The free tier accommodates:

Prototyping (limited scope, small datasets)
Proof-of-concepts (100-200 requests)
Learning and experimentation (low-volume)
Development environment (testing, not production)

Not suitable for:

Production deployments
Any application handling real user traffic
Scaled experimentation (>1,000 requests/month)

Graduating from Free Tier

Transitioning to paid requires adding a billing method (credit card or Google Cloud account). Billing starts immediately upon upgrade. No "trial period" exists; you'll incur charges the moment you exceed the free quota.

Gemini 2.5 Pro Costs

Per-Request Cost Variance

Costs vary based on input and output token counts. Modeling several common tasks:

Task A: Single-turn chat (1K input, 300 output)

Input: $0.00125
Output: $0.003
Total: $0.00425 per request

Task B: Code review (10K input, 1K output)

Input: $0.0125
Output: $0.01
Total: $0.0225 per request

Task C: Document analysis (100K input, 2K output)

Input: $0.125
Output: $0.02
Total: $0.145 per request

Task D: Long context (500K input, 5K output)

Input: $0.625
Output: $0.05
Total: $0.675 per request

Input token cost dominates for large documents. Context size is the primary cost driver.

Monthly Cost Projections

A customer service chat application:

10,000 conversations per month
Average 2K input tokens per conversation
Average 400 output tokens per conversation

Monthly cost:

Input: (10,000 × 2,000 / 1,000,000) × $1.25 = $25
Output: (10,000 × 400 / 1,000,000) × $10 = $40
Total: $65/month

Scale to 100K conversations:

Input: $250
Output: $400
Total: $650/month

A large-scale deployment with 1M conversations monthly:

Input: $2,500
Output: $4,000
Total: $6,500/month

Pro pricing scales linearly with token volume.

When Pro Pricing Makes Sense

Use Gemini 2.5 Pro when:

Reasoning quality matters (mathematical proofs, complex logic)
Code generation accuracy is critical
Context size exceeds 100K tokens regularly
Latency-sensitive applications where Pro's speed is justified

For commodity tasks (classification, basic extraction), Flash is more economical.

Gemini 2.5 Flash Costs

Per-Request Economics

Same tasks on Flash:

Task A: Single-turn chat (1K input, 300 output)

Input: $0.0003
Output: $0.00075
Total: $0.00105 per request

Task B: Code review (10K input, 1K output)

Input: $0.003
Output: $0.0025
Total: $0.0055 per request

Task C: Document analysis (100K input, 2K output)

Input: $0.030
Output: $0.005
Total: $0.035 per request

Task D: Long context (500K input, 5K output)

Input: $0.150
Output: $0.0125
Total: $0.1625 per request

Flash is substantially cheaper. The same Task C costs $0.035 on Flash vs. $0.145 on Pro. That's 76% savings.

High-Volume Deployment

Same customer service example scaled to 1M conversations:

Input: (1,000,000 × 2,000 / 1,000,000) × $0.30 = $600
Output: (1,000,000 × 400 / 1,000,000) × $2.50 = $1,000
Total: $1,600/month

Compared to Pro ($6,500/month), Flash is 75% cheaper. This is the primary advantage for cost-sensitive deployments.

Flash Capability Trade-offs

Flash is optimized for speed and cost, not maximum capability. Testing on reasoning-heavy tasks:

Arithmetic with multi-step reasoning: Flash 78%, Pro 91%
Complex logic puzzles: Flash 72%, Pro 82%
Code generation (simple): Flash 88%, Pro 92%
Code generation (complex): Flash 76%, Pro 89%

For commodity tasks (classification, extraction, moderation), the accuracy difference is minimal. For reasoning-heavy tasks, Pro's advantage is significant.

Choosing Between Flash and Pro

Use Flash for:

Text classification (sentiment, intent, category assignment)
Information extraction (structured data from text)
Content moderation (toxic content detection)
Routine Q&A (FAQ-style responses)
High-volume, time-sensitive processing
Cost-optimized applications

Use Pro for:

Complex reasoning (proofs, troubleshooting, planning)
Code analysis and generation
Creative writing (where quality matters)
Multimodal analysis (better visual reasoning)
Large context handling (better performance at 500K+ tokens)

Batch API and Discounts

Google's batch API allows asynchronous processing with 50% cost reduction.

Batch Pricing

Standard Gemini 2.5 Pro pricing:

Input: $1.25 per 1M tokens
Output: $10 per 1M tokens

Batch Gemini 2.5 Pro pricing:

Input: $0.625 per 1M tokens (50% discount)
Output: $5 per 1M tokens (50% discount)

The trade-off: batch processing is asynchronous. Typical latency is 1-24 hours (depends on queue depth).

Batch API Economics

A summarization job processing 10M tokens overnight (1,000 documents, 10K tokens each):

Output: 500 tokens per summary = 500K total output

Standard API cost:

Input: (10M / 1M) × $1.25 = $12.50
Output: (500K / 1M) × $10 = $5
Total: $17.50

Batch API cost:

Input: (10M / 1M) × $0.625 = $6.25
Output: (500K / 1M) × $5 = $2.50
Total: $8.75
Savings: $8.75 (50%)

For one-time batch jobs, the 50% discount justifies the latency trade-off. For interactive applications, batch is not viable.

Batch Request Format

Batch requests are submitted via a JSONL file containing multiple API requests. Each request is processed independently and results are aggregated into a results file.

Example batch job:

{"custom_id": "1", "params": {"model": "gemini-2.5-pro", "contents": [{"role": "user", "parts": [{"text": "..."}]}]}}
{"custom_id": "2", "params": {"model": "gemini-2.5-pro", "contents": [{"role": "user", "parts": [{"text": "..."}]}]}}
...

Submitting a batch with 1,000 requests and paying 50% on the token cost overhead is worthwhile. The operational overhead (formatting JSONL, polling for results) is justified for cost-sensitive bulk processing.

Context Caching Pricing

Gemini 2.5 Pro supports prompt caching, where cached context tokens are billed at 90% discount.

How Context Caching Works

If developers repeatedly query the same document (or set of documents), Google can cache the processed context:

First request with 500K cached tokens:

Input tokens (new): 5K
Cached tokens: 500K (billed at 10% of input price)
Input cost: (5K / 1M) × $1.25 + (500K / 1M) × $1.25 × 0.1 = $0.00625 + $0.0625 = $0.06875
Output cost: (2K / 1M) × $10 = $0.02
Total: $0.08875

Subsequent requests (cache hit, 500K cached tokens + 5K new tokens):

Input cost: (5K / 1M) × $1.25 + (500K / 1M) × $1.25 × 0.1 = $0.06875
Output cost: (2K / 1M) × $10 = $0.02
Total: $0.08875

The cached portion costs $0.0625 on the first request and $0.0625 on every subsequent request. Over 100 requests, the effective per-request cost of the cached context is $0.000625.

Cache Economics

A system repeatedly analyzing the same 500K-token document (e.g., "answer questions about our company handbook"):

100 queries without caching:

Input per query: 505K tokens (500K doc + 5K query)
Total input: 50.5M tokens
Cost: (50.5M / 1M) × $1.25 = $63.13

100 queries with caching:

First query: input cost $0.06875
99 subsequent queries: input cost $0.06875 each
Total input cost: $6.88
Savings: $56.25 (89% reduction)

Context caching is transformative for retrieval systems where the same documents are queried repeatedly.

Cache Invalidation and Limits

Caches persist for 1 hour after last use. If developers don't query within 1 hour, the cache is dropped and rebuilt on the next request.

Cache size limits: Google doesn't publicize exact limits, but testing suggests 2M tokens can be cached per session.

For small document bases (under 1M tokens), caching provides massive savings. For continuously updated documents, caching provides less value (cache invalidates frequently).

Gemini 2.5 Pro processes images, videos, and text. Understanding token costs for multi-modal inputs is essential.

Image Token Consumption

Image tokens depend on image size and quality:

Thumbnail image (100×100 pixels):

Token consumption: 258 tokens

Small image (480×480 pixels):

Token consumption: 258 tokens + extra detail tokens

Standard image (1024×1024 pixels):

Token consumption: 258 + ~100-200 additional tokens = 358-458 tokens

High-resolution image (2048×2048 pixels):

Token consumption: 258 + ~300-400 additional tokens = 558-658 tokens

Baseline: every image costs at least 258 tokens. Additional detail tokens depend on resolution and complexity.

Practical cost: 10 standard images + 5K text = roughly 5K tokens total.

Video Token Consumption

Video is processed by extracting key frames:

Short video (10 seconds, 24fps, 240 key frames = 10 frames extracted):

Token consumption: 10 frames × 400 tokens/frame = 4,000 tokens

Medium video (60 seconds, 6 key frames):

Token consumption: 2,400 tokens

Long video (10 minutes, 10 key frames):

Token consumption: 4,000 tokens

Video token costs are dominated by the number of extracted frames, not duration.

Audio Token Consumption

Gemini 2.5 Pro does not directly process audio. Audio must be transcribed first (using a separate speech-to-text API), then passed as text.

Cost Projections by Workload

Scenario A: Customer Support Chatbot

Setup:

1,000 chats per day
Average 2K input tokens (customer messages + context)
Average 300 output tokens (bot responses)
Using Gemini 2.5 Flash (cost-optimized)

Monthly cost (30 days):

Input: (1,000 × 30 × 2,000 / 1,000,000) × $0.30 = $18
Output: (1,000 × 30 × 300 / 1,000,000) × $2.50 = $22.50
Total: $40.50/month

Annual cost: $486

This is very cost-effective. A single developer salary ($60K+) dwarfs API costs.

Scenario B: Document Analysis Platform

Setup:

100 documents per month
Average 50K tokens per document
Average 2K output tokens per analysis
Using Gemini 2.5 Pro (reasoning-heavy)
Batch API for cost optimization

Monthly cost:

Input: (100 × 50,000 / 1,000,000) × $0.625 = $3.13 (batch discount)
Output: (100 × 2,000 / 1,000,000) × $5 = $1.00 (batch discount)
Total: $4.13/month

Annual cost: $49.60

Batch processing reduces costs dramatically. The trade-off: 1-24 hour latency.

Scenario C: Large Codebase Analysis

Setup:

10 analyses per month
Average 300K tokens per analysis (large repositories)
Average 5K output tokens per analysis
Using Gemini 2.5 Pro (large context handling)

Monthly cost:

Input: (10 × 300,000 / 1,000,000) × $1.25 = $3.75
Output: (10 × 5,000 / 1,000,000) × $10 = $0.50
Total: $4.25/month

Annual cost: $51

Even large context windows are inexpensive.

Scenario D: Video Analysis Service

Setup:

100 videos per month
Average 5 key frames per video = 5 × 400 tokens = 2,000 image tokens
Average 5K text tokens per analysis
Using Gemini 2.5 Pro

Monthly cost per video:

Image tokens: (2,000 / 1,000,000) × $1.25 = $0.0025
Text input tokens: (5,000 / 1,000,000) × $1.25 = $0.00625
Output tokens: (2,000 / 1,000,000) × $10 = $0.02
Total per video: $0.03475

100 videos monthly:

Total: $3.48/month

Annual cost: $41.70

Video analysis is cheap because extracted frames consume relatively few tokens.

Comparison to Competitors

How does Gemini 2.5 pricing compare to other LLM providers?

Gemini 2.5 Pro vs. OpenAI GPT-5

Gemini 2.5 Pro:

Input: $1.25 per 1M tokens
Output: $10 per 1M tokens

OpenAI GPT-5:

Input: $1.25 per 1M tokens
Output: $10 per 1M tokens

Identical pricing. Choice depends on capability (reasoning vs. multimodal + context).

Gemini 2.5 Flash vs. Anthropic Claude Sonnet 4.6

Gemini 2.5 Flash:

Input: $0.30 per 1M tokens
Output: $2.50 per 1M tokens

Anthropic Claude Sonnet 4.6:

Input: $3 per 1M tokens
Output: $15 per 1M tokens

Flash is 10x cheaper on input, 6x cheaper on output. For commodity tasks, Gemini Flash dominates on cost.

Gemini 2.5 Flash vs. Cohere Command R

Gemini 2.5 Flash:

Input: $0.30 per 1M tokens
Output: $2.50 per 1M tokens

Cohere Command R:

Input: $0.15 per 1M tokens
Output: $0.60 per 1M tokens

Gemini 2.5 Flash is more capable on complex tasks. For pure cost optimization, Cohere wins. For balanced capability + cost, Flash is superior.

Summary Pricing Table

Model	Input	Output	Best For
Gemini 2.5 Pro	$1.25	$10	Reasoning, multimodal
Gemini 2.5 Flash	$0.30	$2.50	Cost-optimized, commodity
GPT-5	$1.25	$10	Reasoning, code
Claude Sonnet 4.6	$3	$15	General capability
Cohere Command R	$0.15	$0.60	Commodity tasks
Cohere Command R+	$2.50	$10.00	Complex reasoning

Hidden Fees and Gotchas

Rate Limit Penalties

Exceeding the rate limit doesn't incur extra charges; requests simply fail with HTTP 429. However, retry logic may cause duplicate charges if not implemented carefully.

Implement exponential backoff with random jitter to avoid thundering herd when limits are reached.

Cache Eviction

If a cached context is evicted (1 hour timeout or cache full), the next request rebuilds the cache. you'll be charged the full per-token rate for re-processing, even if the same tokens were cached before.

For frequently re-used documents, cache invalidation may be expensive. Budget for rebuild costs.

Image Processing Overhead

All images incur a minimum 258-token cost. Sending 1,000 tiny images (each 258 tokens) costs 258K tokens, even if the images are 10×10 pixels.

For high-volume image processing, pre-filter low-value images to avoid unnecessary token consumption.

Rate Limit Escalation Delays

The default rate limit is 2 requests/minute (free tier) or 60 requests/minute (paid tier). Requesting escalation to 10,000+ requests/minute may take 24-48 hours. During peak growth, this can delay scaling.

Plan rate limit requests 1-2 weeks in advance of anticipated growth.

Context Window Truncation

If the input exceeds 1M tokens, it's silently truncated. you're billed for the truncated portion, even though only part of it was processed. Unlike some providers, Google doesn't fail or warn; truncation happens silently.

Implement token counting on the client side to avoid accidental truncation.

Output Token Billing for Errors

If a request fails mid-generation (e.g., due to timeout or provider error), developers may still be billed for partial output tokens. Error handling should account for this.

Treat API errors as potential billing events; log all interactions for reconciliation.

FAQ

Is the free tier suitable for production?

No. The 2M monthly token limit and 2 requests/minute rate limit are only suitable for prototyping. Production applications require paid tier.

What's the difference between Gemini 2.5 Pro and Flash?

Pro is more capable on reasoning and complex tasks. Flash is 16x cheaper and suitable for commodity tasks. Pro has better multimodal performance. Choose based on task requirements.

Does context caching reduce output token costs?

No, only input tokens are cached. Output tokens are always billed at full rate.

Can I use batch API and context caching together?

No, they're mutually exclusive. Batch API is for async processing with 50% discount. Context caching is for sync requests with 90% discount on cached input.

What happens if I exceed my rate limit?

Requests fail with HTTP 429. There's no automatic queue or billing overage. You must reduce request rate or request limit escalation from Google.

Is there a monthly minimum charge?

No, pure pay-as-you-go. No commitments or minimums.

Can I pre-purchase credits for discount?

Google Cloud offers committed use discounts on some services but not on Gemini API usage (as of March 2026). Pricing is per-token at published rates.

How do I estimate my monthly bill?

Multiply your monthly input tokens by $1.25/1M (or $0.30/1M for Flash) and output tokens by $10/1M (or $2.50/1M for Flash). Use context caching if applicable (90% discount on cached input). Use batch API if applicable (50% discount overall).

Is there a production tier with volume discounts?

Contact Google Cloud sales for potential volume discounts. No public volume tiers exist (as of March 2026).

Which Gemini 2.5 model should I start with?

If unsure, start with Flash. It's 95% cheaper and handles most tasks. Upgrade to Pro only if accuracy on reasoning or complex tasks is insufficient.

Sources

Google. "Gemini Pricing." Accessed March 2026. Retrieved from AI.google.dev/pricing.
Google. "Gemini 2.5 Model Announcement." March 2026. Retrieved from google.ai/gemini.
Google. "Batch Processing Guide." Retrieved from AI.google.dev/docs/batch.
Google. "Context Caching Guide." Retrieved from AI.google.dev/docs/caching.
DeployBase. "LLM Pricing Database." March 2026. Internal research dataset.

Contents

Overview

Gemini 2.5 Model Lineup and Pricing

Gemini 2.5 Pro

Gemini 2.5 Flash

Gemini 1.5 Pro (Legacy)

Free Tier Limits and Quotas

AI Studio Free Tier

Effective Free Tier Capacity

Practical Use Cases for Free Tier

Graduating from Free Tier

Gemini 2.5 Pro Costs

Per-Request Cost Variance

Monthly Cost Projections

When Pro Pricing Makes Sense

Gemini 2.5 Flash Costs

Per-Request Economics

High-Volume Deployment

Flash Capability Trade-offs

Choosing Between Flash and Pro

Batch API and Discounts

Batch Pricing

Batch API Economics

Batch Request Format

Context Caching Pricing

How Context Caching Works

Cache Economics

Cache Invalidation and Limits

Multi-Modal Token Accounting

Image Token Consumption

Video Token Consumption

Audio Token Consumption

Cost Projections by Workload

Scenario A: Customer Support Chatbot

Scenario B: Document Analysis Platform

Scenario C: Large Codebase Analysis

Scenario D: Video Analysis Service

Comparison to Competitors

Gemini 2.5 Pro vs. OpenAI GPT-5

Gemini 2.5 Flash vs. Anthropic Claude Sonnet 4.6

Gemini 2.5 Flash vs. Cohere Command R

Summary Pricing Table

Hidden Fees and Gotchas

Rate Limit Penalties

Cache Eviction

Image Processing Overhead

Rate Limit Escalation Delays

Context Window Truncation

Output Token Billing for Errors

FAQ

Related Resources

Sources