Google Gemini 2.5 Pricing: API Costs & Free Tier Guide

Deploybase · January 8, 2026 · LLM Pricing

Contents

Overview

Gemini 2.5 Pro: $1.25 input, $10 output per 1M tokens.

Gemini 2.5 Flash: $0.30 input, $2.50 output.

Free tier: 2M input tokens/month via AI Studio. Good for prototyping.

This guide breaks down Gemini pricing, free tier quotas, batch processing, and context caching discounts.

Gemini 2.5 Model Lineup and Pricing

Google's Gemini 2.5 family includes three models optimized for different performance-cost trade-offs.

Gemini 2.5 Pro

Gemini 2.5 Pro is the flagship model, optimized for maximum capability and reasoning. It features a 1M token context window and excels at complex reasoning, code generation, and multimodal analysis.

Pricing (March 2026):

  • Input tokens: $1.25 per 1 million tokens
  • Output tokens: $10 per 1 million tokens
  • Input/output ratio: 8:1 (output is 8x more expensive)

Example cost: processing a 50K-token document and generating a 2K-token summary:

  • Input cost: (50,000 / 1,000,000) × $1.25 = $0.0625
  • Output cost: (2,000 / 1,000,000) × $10 = $0.02
  • Total: $0.0825

Multiply by 1,000 daily requests: $82.50/day or $30,112.50/year for this workload.

Gemini 2.5 Flash

Gemini 2.5 Flash is the efficient model, optimized for speed and cost. It has a 1M token context window (same as Pro) but with lower latency and significantly lower pricing. It's suitable for classification, extraction, and routine processing tasks.

Pricing (March 2026):

  • Input tokens: $0.30 per 1 million tokens
  • Output tokens: $2.50 per 1 million tokens
  • Input/output ratio: 8:1 (output is ~8x input)

Flash pricing is 1/4th the cost of Pro on input, 1/4th on output.

Same document analysis on Flash:

  • Input cost: (50,000 / 1,000,000) × $0.30 = $0.015
  • Output cost: (2,000 / 1,000,000) × $2.50 = $0.005
  • Total: $0.020

Multiply by 1,000 daily requests: $20/day or $7,300/year.

Flash is 76% cheaper than Pro for this workload. The trade-off: Flash is less capable on complex reasoning tasks. For commodity tasks where reasoning quality plateaus early, Flash is superior.

Gemini 1.5 Pro (Legacy)

Google maintains backward compatibility with Gemini 1.5 Pro (released mid-2024):

Pricing (March 2026):

  • Input tokens: $0.075 per 1 million tokens
  • Output tokens: $0.30 per 1 million tokens

Gemini 1.5 Pro pricing is similar to Gemini 2.5 Flash (both in the low-cost tier). The 1.5 Pro model is older and less capable than both 2.5 Pro and 2.5 Flash. There's no reason to use 1.5 Pro in new projects; it exists for backward compatibility with existing deployments.

Avoid it for new work.

Free Tier Limits and Quotas

Google's free tier is generous but requires understanding the specific limits.

AI Studio Free Tier

Access via google.ai/studio:

  • Monthly limit: 2 million input tokens
  • No output token limit
  • Model access: Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 1.5 Pro
  • Cost: $0
  • Rate limit: 2 requests per minute (very restrictive)
  • No SLA or uptime guarantee

The "no output token limit" is misleading; developers're limited by the 2M input tokens. If developers process 2M tokens and generate 1M output tokens, developers've used all the free quota.

Effective Free Tier Capacity

With a 2-request-per-minute rate limit:

  • Per hour: 120 requests
  • Per day: 2,880 requests
  • Per month: 86,400 requests

A typical request uses 100-1,000 input tokens. Assuming 500 tokens per request:

  • Monthly token consumption: 86,400 × 500 = 43.2M tokens
  • Free tier allocation: 2M tokens
  • Utilization: 4.6% of potential requests

The 2M token limit is the bottleneck. At 500 tokens per request, developers can make only 4,000 requests monthly before exhausting the quota.

Practical Use Cases for Free Tier

The free tier accommodates:

  • Prototyping (limited scope, small datasets)
  • Proof-of-concepts (100-200 requests)
  • Learning and experimentation (low-volume)
  • Development environment (testing, not production)

Not suitable for:

  • Production deployments
  • Any application handling real user traffic
  • Scaled experimentation (>1,000 requests/month)

Graduating from Free Tier

Transitioning to paid requires adding a billing method (credit card or Google Cloud account). Billing starts immediately upon upgrade. No "trial period" exists; developers'll incur charges the moment developers exceed free quota.

Gemini 2.5 Pro Costs

Per-Request Cost Variance

Costs vary based on input and output token counts. Modeling several common tasks:

Task A: Single-turn chat (1K input, 300 output)

  • Input: $0.00125
  • Output: $0.003
  • Total: $0.00425 per request

Task B: Code review (10K input, 1K output)

  • Input: $0.0125
  • Output: $0.01
  • Total: $0.0225 per request

Task C: Document analysis (100K input, 2K output)

  • Input: $0.125
  • Output: $0.02
  • Total: $0.145 per request

Task D: Long context (500K input, 5K output)

  • Input: $0.625
  • Output: $0.05
  • Total: $0.675 per request

Input token cost dominates for large documents. Context size is the primary cost driver.

Monthly Cost Projections

A customer service chat application:

  • 10,000 conversations per month
  • Average 2K input tokens per conversation
  • Average 400 output tokens per conversation

Monthly cost:

  • Input: (10,000 × 2,000 / 1,000,000) × $1.25 = $25
  • Output: (10,000 × 400 / 1,000,000) × $10 = $40
  • Total: $65/month

Scale to 100K conversations:

  • Input: $250
  • Output: $400
  • Total: $650/month

A large-scale deployment with 1M conversations monthly:

  • Input: $2,500
  • Output: $4,000
  • Total: $6,500/month

Pro pricing scales linearly with token volume.

When Pro Pricing Makes Sense

Use Gemini 2.5 Pro when:

  • Reasoning quality matters (mathematical proofs, complex logic)
  • Code generation accuracy is critical
  • Context size exceeds 100K tokens regularly
  • Latency-sensitive applications where Pro's speed is justified

For commodity tasks (classification, basic extraction), Flash is more economical.

Gemini 2.5 Flash Costs

Per-Request Economics

Same tasks on Flash:

Task A: Single-turn chat (1K input, 300 output)

  • Input: $0.0003
  • Output: $0.00075
  • Total: $0.00105 per request

Task B: Code review (10K input, 1K output)

  • Input: $0.003
  • Output: $0.0025
  • Total: $0.0055 per request

Task C: Document analysis (100K input, 2K output)

  • Input: $0.030
  • Output: $0.005
  • Total: $0.035 per request

Task D: Long context (500K input, 5K output)

  • Input: $0.150
  • Output: $0.0125
  • Total: $0.1625 per request

Flash is substantially cheaper. The same Task C costs $0.035 on Flash vs. $0.145 on Pro. That's 76% savings.

High-Volume Deployment

Same customer service example scaled to 1M conversations:

  • Input: (1,000,000 × 2,000 / 1,000,000) × $0.30 = $600
  • Output: (1,000,000 × 400 / 1,000,000) × $2.50 = $1,000
  • Total: $1,600/month

Compared to Pro ($6,500/month), Flash is 75% cheaper. This is the primary advantage for cost-sensitive deployments.

Flash Capability Trade-offs

Flash is optimized for speed and cost, not maximum capability. Testing on reasoning-heavy tasks:

  • Arithmetic with multi-step reasoning: Flash 78%, Pro 91%
  • Complex logic puzzles: Flash 72%, Pro 82%
  • Code generation (simple): Flash 88%, Pro 92%
  • Code generation (complex): Flash 76%, Pro 89%

For commodity tasks (classification, extraction, moderation), the accuracy difference is minimal. For reasoning-heavy tasks, Pro's advantage is significant.

Choosing Between Flash and Pro

Use Flash for:

  • Text classification (sentiment, intent, category assignment)
  • Information extraction (structured data from text)
  • Content moderation (toxic content detection)
  • Routine Q&A (FAQ-style responses)
  • High-volume, time-sensitive processing
  • Cost-optimized applications

Use Pro for:

  • Complex reasoning (proofs, troubleshooting, planning)
  • Code analysis and generation
  • Creative writing (where quality matters)
  • Multimodal analysis (better visual reasoning)
  • Large context handling (better performance at 500K+ tokens)

Batch API and Discounts

Google's batch API allows asynchronous processing with 50% cost reduction.

Batch Pricing

Standard Gemini 2.5 Pro pricing:

  • Input: $1.25 per 1M tokens
  • Output: $10 per 1M tokens

Batch Gemini 2.5 Pro pricing:

  • Input: $0.625 per 1M tokens (50% discount)
  • Output: $5 per 1M tokens (50% discount)

The trade-off: batch processing is asynchronous. Typical latency is 1-24 hours (depends on queue depth).

Batch API Economics

A summarization job processing 10M tokens overnight (1,000 documents, 10K tokens each):

  • Output: 500 tokens per summary = 500K total output

Standard API cost:

  • Input: (10M / 1M) × $1.25 = $12.50
  • Output: (500K / 1M) × $10 = $5
  • Total: $17.50

Batch API cost:

  • Input: (10M / 1M) × $0.625 = $6.25
  • Output: (500K / 1M) × $5 = $2.50
  • Total: $8.75
  • Savings: $8.75 (50%)

For one-time batch jobs, the 50% discount justifies the latency trade-off. For interactive applications, batch is not viable.

Batch Request Format

Batch requests are submitted via a JSONL file containing multiple API requests. Each request is processed independently and results are aggregated into a results file.

Example batch job:

{"custom_id": "1", "params": {"model": "gemini-2.5-pro", "contents": [{"role": "user", "parts": [{"text": "..."}]}]}}
{"custom_id": "2", "params": {"model": "gemini-2.5-pro", "contents": [{"role": "user", "parts": [{"text": "..."}]}]}}
...

Submitting a batch with 1,000 requests and paying 50% on the token cost overhead is worthwhile. The operational overhead (formatting JSONL, polling for results) is justified for cost-sensitive bulk processing.

Context Caching Pricing

Gemini 2.5 Pro supports prompt caching, where cached context tokens are billed at 90% discount.

How Context Caching Works

If developers repeatedly query the same document (or set of documents), Google can cache the processed context:

First request with 500K cached tokens:

  • Input tokens (new): 5K
  • Cached tokens: 500K (billed at 10% of input price)
  • Input cost: (5K / 1M) × $1.25 + (500K / 1M) × $1.25 × 0.1 = $0.00625 + $0.0625 = $0.06875
  • Output cost: (2K / 1M) × $10 = $0.02
  • Total: $0.08875

Subsequent requests (cache hit, 500K cached tokens + 5K new tokens):

  • Input cost: (5K / 1M) × $1.25 + (500K / 1M) × $1.25 × 0.1 = $0.06875
  • Output cost: (2K / 1M) × $10 = $0.02
  • Total: $0.08875

The cached portion costs $0.0625 on the first request and $0.0625 on every subsequent request. Over 100 requests, the effective per-request cost of the cached context is $0.000625.

Cache Economics

A system repeatedly analyzing the same 500K-token document (e.g., "answer questions about our company handbook"):

100 queries without caching:

  • Input per query: 505K tokens (500K doc + 5K query)
  • Total input: 50.5M tokens
  • Cost: (50.5M / 1M) × $1.25 = $63.13

100 queries with caching:

  • First query: input cost $0.06875
  • 99 subsequent queries: input cost $0.06875 each
  • Total input cost: $6.88
  • Savings: $56.25 (89% reduction)

Context caching is transformative for retrieval systems where the same documents are queried repeatedly.

Cache Invalidation and Limits

Caches persist for 1 hour after last use. If developers don't query within 1 hour, the cache is dropped and rebuilt on the next request.

Cache size limits: Google doesn't publicize exact limits, but testing suggests 2M tokens can be cached per session.

For small document bases (under 1M tokens), caching provides massive savings. For continuously updated documents, caching provides less value (cache invalidates frequently).

Multi-Modal Token Accounting

Gemini 2.5 Pro processes images, videos, and text. Understanding token costs for multi-modal inputs is essential.

Image Token Consumption

Image tokens depend on image size and quality:

Thumbnail image (100×100 pixels):

  • Token consumption: 258 tokens

Small image (480×480 pixels):

  • Token consumption: 258 tokens + extra detail tokens

Standard image (1024×1024 pixels):

  • Token consumption: 258 + ~100-200 additional tokens = 358-458 tokens

High-resolution image (2048×2048 pixels):

  • Token consumption: 258 + ~300-400 additional tokens = 558-658 tokens

Baseline: every image costs at least 258 tokens. Additional detail tokens depend on resolution and complexity.

Practical cost: 10 standard images + 5K text = roughly 5K tokens total.

Video Token Consumption

Video is processed by extracting key frames:

Short video (10 seconds, 24fps, 240 key frames = 10 frames extracted):

  • Token consumption: 10 frames × 400 tokens/frame = 4,000 tokens

Medium video (60 seconds, 6 key frames):

  • Token consumption: 2,400 tokens

Long video (10 minutes, 10 key frames):

  • Token consumption: 4,000 tokens

Video token costs are dominated by the number of extracted frames, not duration.

Audio Token Consumption

Gemini 2.5 Pro does not directly process audio. Audio must be transcribed first (using a separate speech-to-text API), then passed as text.

Cost Projections by Workload

Scenario A: Customer Support Chatbot

Setup:

  • 1,000 chats per day
  • Average 2K input tokens (customer messages + context)
  • Average 300 output tokens (bot responses)
  • Using Gemini 2.5 Flash (cost-optimized)

Monthly cost (30 days):

  • Input: (1,000 × 30 × 2,000 / 1,000,000) × $0.30 = $18
  • Output: (1,000 × 30 × 300 / 1,000,000) × $2.50 = $22.50
  • Total: $40.50/month

Annual cost: $486

This is very cost-effective. A single developer salary ($60K+) dwarfs API costs.

Scenario B: Document Analysis Platform

Setup:

  • 100 documents per month
  • Average 50K tokens per document
  • Average 2K output tokens per analysis
  • Using Gemini 2.5 Pro (reasoning-heavy)
  • Batch API for cost optimization

Monthly cost:

  • Input: (100 × 50,000 / 1,000,000) × $0.625 = $3.13 (batch discount)
  • Output: (100 × 2,000 / 1,000,000) × $5 = $1.00 (batch discount)
  • Total: $4.13/month

Annual cost: $49.60

Batch processing reduces costs dramatically. The trade-off: 1-24 hour latency.

Scenario C: Large Codebase Analysis

Setup:

  • 10 analyses per month
  • Average 300K tokens per analysis (large repositories)
  • Average 5K output tokens per analysis
  • Using Gemini 2.5 Pro (large context handling)

Monthly cost:

  • Input: (10 × 300,000 / 1,000,000) × $1.25 = $3.75
  • Output: (10 × 5,000 / 1,000,000) × $10 = $0.50
  • Total: $4.25/month

Annual cost: $51

Even large context windows are inexpensive.

Scenario D: Video Analysis Service

Setup:

  • 100 videos per month
  • Average 5 key frames per video = 5 × 400 tokens = 2,000 image tokens
  • Average 5K text tokens per analysis
  • Using Gemini 2.5 Pro

Monthly cost per video:

  • Image tokens: (2,000 / 1,000,000) × $1.25 = $0.0025
  • Text input tokens: (5,000 / 1,000,000) × $1.25 = $0.00625
  • Output tokens: (2,000 / 1,000,000) × $10 = $0.02
  • Total per video: $0.03475

100 videos monthly:

  • Total: $3.48/month

Annual cost: $41.70

Video analysis is cheap because extracted frames consume relatively few tokens.

Comparison to Competitors

How does Gemini 2.5 pricing compare to other LLM providers?

Gemini 2.5 Pro vs. OpenAI GPT-5

Gemini 2.5 Pro:

  • Input: $1.25 per 1M tokens
  • Output: $10 per 1M tokens

OpenAI GPT-5:

  • Input: $1.25 per 1M tokens
  • Output: $10 per 1M tokens

Identical pricing. Choice depends on capability (reasoning vs. multimodal + context).

Gemini 2.5 Flash vs. Anthropic Claude Sonnet 4.6

Gemini 2.5 Flash:

  • Input: $0.30 per 1M tokens
  • Output: $2.50 per 1M tokens

Anthropic Claude Sonnet 4.6:

  • Input: $3 per 1M tokens
  • Output: $15 per 1M tokens

Flash is 10x cheaper on input, 6x cheaper on output. For commodity tasks, Gemini Flash dominates on cost.

Gemini 2.5 Flash vs. Cohere Command R

Gemini 2.5 Flash:

  • Input: $0.30 per 1M tokens
  • Output: $2.50 per 1M tokens

Cohere Command R:

  • Input: $0.15 per 1M tokens
  • Output: $0.60 per 1M tokens

Gemini 2.5 Flash is more capable on complex tasks. For pure cost optimization, Cohere wins. For balanced capability + cost, Flash is superior.

Summary Pricing Table

ModelInputOutputBest For
Gemini 2.5 Pro$1.25$10Reasoning, multimodal
Gemini 2.5 Flash$0.30$2.50Cost-optimized, commodity
GPT-5$1.25$10Reasoning, code
Claude Sonnet 4.6$3$15General capability
Cohere Command R$0.15$0.60Commodity tasks
Cohere Command R+$2.50$10.00Complex reasoning

Hidden Fees and Gotchas

Rate Limit Penalties

Exceeding the rate limit doesn't incur extra charges; requests simply fail with HTTP 429. However, retry logic may cause duplicate charges if not implemented carefully.

Implement exponential backoff with random jitter to avoid thundering herd when limits are reached.

Cache Eviction

If a cached context is evicted (1 hour timeout or cache full), the next request rebuilds the cache. Developers'll be charged the full per-token rate for re-processing, even if the same tokens were cached before.

For frequently re-used documents, cache invalidation may be expensive. Budget for rebuild costs.

Image Processing Overhead

All images incur a minimum 258-token cost. Sending 1,000 tiny images (each 258 tokens) costs 258K tokens, even if the images are 10×10 pixels.

For high-volume image processing, pre-filter low-value images to avoid unnecessary token consumption.

Rate Limit Escalation Delays

The default rate limit is 2 requests/minute (free tier) or 60 requests/minute (paid tier). Requesting escalation to 10,000+ requests/minute may take 24-48 hours. During peak growth, this can delay scaling.

Plan rate limit requests 1-2 weeks in advance of anticipated growth.

Context Window Truncation

If the input exceeds 1M tokens, it's silently truncated. Developers're billed for the truncated portion, even though only part of it was processed. Unlike some providers, Google doesn't fail or warn; truncation happens silently.

Implement token counting on the client side to avoid accidental truncation.

Output Token Billing for Errors

If a request fails mid-generation (e.g., due to timeout or provider error), developers may still be billed for partial output tokens. Error handling should account for this.

Treat API errors as potential billing events; log all interactions for reconciliation.

FAQ

Is the free tier suitable for production?

No. The 2M monthly token limit and 2 requests/minute rate limit are only suitable for prototyping. Production applications require paid tier.

What's the difference between Gemini 2.5 Pro and Flash?

Pro is more capable on reasoning and complex tasks. Flash is 16x cheaper and suitable for commodity tasks. Pro has better multimodal performance. Choose based on task requirements.

Does context caching reduce output token costs?

No, only input tokens are cached. Output tokens are always billed at full rate.

Can I use batch API and context caching together?

No, they're mutually exclusive. Batch API is for async processing with 50% discount. Context caching is for sync requests with 90% discount on cached input.

What happens if I exceed my rate limit?

Requests fail with HTTP 429. There's no automatic queue or billing overage. You must reduce request rate or request limit escalation from Google.

Is there a monthly minimum charge?

No, pure pay-as-you-go. No commitments or minimums.

Can I pre-purchase credits for discount?

Google Cloud offers committed use discounts on some services but not on Gemini API usage (as of March 2026). Pricing is per-token at published rates.

How do I estimate my monthly bill?

Multiply your monthly input tokens by $1.25/1M (or $0.30/1M for Flash) and output tokens by $10/1M (or $2.50/1M for Flash). Use context caching if applicable (90% discount on cached input). Use batch API if applicable (50% discount overall).

Is there a production tier with volume discounts?

Contact Google Cloud sales for potential volume discounts. No public volume tiers exist (as of March 2026).

Which Gemini 2.5 model should I start with?

If unsure, start with Flash. It's 95% cheaper and handles most tasks. Upgrade to Pro only if accuracy on reasoning or complex tasks is insufficient.

Sources

  • Google. "Gemini Pricing." Accessed March 2026. Retrieved from AI.google.dev/pricing.
  • Google. "Gemini 2.5 Model Announcement." March 2026. Retrieved from google.ai/gemini.
  • Google. "Batch Processing Guide." Retrieved from AI.google.dev/docs/batch.
  • Google. "Context Caching Guide." Retrieved from AI.google.dev/docs/caching.
  • DeployBase. "LLM Pricing Database." March 2026. Internal research dataset.