Gemini API Pricing 2026: All Tiers & Free Limits

Deploybase · January 8, 2026 · LLM Pricing

Contents

Gemini API Pricing 2026: Overview

Gemini API Pricing 2026 is the focus of this guide. Gemini 2.5 Pro: $1.25 input per 1M tokens for prompts up to 200K tokens; $2.50 input over 200K. Output: $10 per 1M tokens under 200K context, $15 over.

Gemini 2.5 Flash: $0.30 input, $2.50 output.

Free tier: 1M tokens/day via AI Studio.

Context caching discounts tokens by 90% on reused context. Pick Flash for speed, Pro for reasoning.


Gemini API Tiers

Google offers Gemini through multiple access paths:

Free Tier (Google AI Studio)

Access: google.ai/studio

Models:

  • Gemini 2.5 Flash
  • Gemini 1.5 Flash
  • Gemini 1.5 Pro (limited)

Rate limits:

  • 60 requests/minute (RPM)
  • 1M tokens/day (TTD)
  • Sliding window: No burst protection

Use cases: Prototyping, small-scale experimentation, education.

Cost: Free (no credit card required).

Access: Google Cloud Console, Vertex AI API

Models:

  • Gemini 2.5 Pro (latest)
  • Gemini 2.5 Flash (latest)
  • Gemini 1.5 Pro, Flash (legacy)

Rate limits: Per-project quotas (customizable)

Pricing: Per 1M input tokens and per 1M output tokens

Cost: Pay-as-developers-go, no minimum, can use free trial credits ($300 for new accounts).


Gemini 2.5 Pro Pricing

Gemini 2.5 Pro is Google's flagship reasoning-focused model, competing directly with GPT-4.1 and Claude Opus.

Token Pricing (as of March 2026)

MetricCost (≤200K context)Cost (>200K context)
Input tokens$1.25 per 1M$2.50 per 1M
Output tokens$10.00 per 1M$15.00 per 1M
Images (small)$2.50 per 100 images$2.50 per 100 images
Images (large)$7.50 per 100 images$7.50 per 100 images
Audio input$0.0006 per minute$0.0006 per minute

Example: 1,000 input tokens + 500 output tokens (under 200K context):

  • Input cost: (1,000 / 1,000,000) × $1.25 = $0.00125
  • Output cost: (500 / 1,000,000) × $10.00 = $0.005
  • Total: $0.00625

Example: 100k input + 10k output (medium-sized request, under 200K context):

  • Input: (100,000 / 1,000,000) × $1.25 = $0.125
  • Output: (10,000 / 1,000,000) × $10.00 = $0.10
  • Total: $0.225

Context Window

Gemini 2.5 Pro supports 1,000,000 tokens (1M context window).

Implication for pricing: Large document processing is feasible. A 1M-token policy document costs $1.25 in input tokens to process.

Batch Processing (Coming 2026)

Google announced batch processing for Gemini Pro (lower cost for non-urgent queries). Pricing not yet finalized, but expected:

  • 50% discount for batch jobs submitted with <24-hour SLA
  • No guaranteed latency

Gemini 2.5 Flash Pricing

Gemini 2.5 Flash is Google's efficiency-focused model, optimized for speed and cost. Similar capability to Claude Sonnet 4.6, significantly faster than Pro.

Token Pricing (as of March 2026)

MetricCost
Input tokens$0.30 per 1M
Output tokens$2.50 per 1M
Images (small)$1.00 per 100 images
Images (large)$2.50 per 100 images
Audio input$0.00006 per minute (10x cheaper than Pro)

Cost comparison for 100k input + 10k output:

  • Input: (100,000 / 1,000,000) × $0.30 = $0.03
  • Output: (10,000 / 1,000,000) × $2.50 = $0.025
  • Total: $0.055

vs. Gemini Pro for same tokens:

  • Pro total: $0.225
  • Flash total: $0.055
  • Savings: 76%

Speed & Performance

Gemini 2.5 Flash trades some reasoning capability for speed:

  • First token latency: 100-200ms (vs. Pro's 500ms)
  • Throughput: 10k tokens/sec (vs. Pro's 2k)
  • Suitable for: Chatbots, real-time inference, summarization
  • Not suitable for: Complex multi-step reasoning, competitive exams, novel problem-solving

Free Tier & Rate Limits

Google AI Studio (Gemini 2.5 Flash)

Daily limits:

  • 1M tokens per day
  • 60 requests per minute (RPM)
  • 10 requests per second (RPS)
  • Sliding window enforcement (tokens reset every 24 hours UTC)

Practical impact:

  • A single 1M-token request exhausts daily limit
  • Moderate use cases (10-100 requests/day) stay within limits
  • Scale-out to production requires paid tier

No billing required: Google AI Studio is free, no credit card needed.

New Google Cloud accounts receive:

  • $300 free credits (expires after 90 days)
  • Full API access to all Gemini models
  • Same rate limits as paid tier (customizable)

Cost calculation:

  • $300 ÷ $0.30 per 1M input (Flash) = 1B input tokens free
  • Typical use: 2-3 months for small-to-medium applications

Context-Dependent Pricing

Gemini pricing has nuances related to context window and cache behavior.

Context Cache (Upcoming)

Google announced "Prompt Cache" for Gemini API (March 2026 beta). Mechanism:

  • Store frequently-accessed long contexts (docs, codebase) in cache
  • Reuse cached context across multiple API calls
  • Cache hit costs: 10% of input token price
  • Cache miss costs: 100% of input token price
  • 5M token cache per project

Example:

  • Upload 100k-token codebase (100k tokens @ $0.30 per 1M = $0.03)
  • First call: Pay 100% for context = $0.03
  • Next 1,000 calls reuse cache: Each pays 10% = $0.003 per call
  • Savings over 1,000 calls: $3 vs. $30 = 90% reduction

Context cache pricing is still experimental; volumes may adjust during 2026.

Dynamic Token Pricing

Gemini pricing is fixed per token (unlike some competitors with variable rates). No surge pricing, no demand-based adjustments.

This simplifies budgeting but removes upside if token compression improves.


Image & Audio Handling Costs

Gemini 2.5 Flash and Pro support multimodal inputs. Pricing varies by content type.

Image Pricing

Gemini 2.5 Flash:

  • Small images (<= 256×256px): $1.00 per 100 images
  • Large images (> 256×256px): $2.50 per 100 images

Gemini 2.5 Pro:

  • Small images: $2.50 per 100 images
  • Large images: $7.50 per 100 images

Example: 100 large images through Gemini 2.5 Flash = $2.50 total. Plus any text tokens in the request.

Comparison to OpenAI:

  • GPT-4V: $0.0025 per image (variable, low-res) or $0.0075 (high-res)
  • Gemini Flash: $0.025 per image (large) = comparable

Audio Input Pricing

Gemini 2.5 Flash:

  • $0.00006 per minute of audio

Gemini 2.5 Pro:

  • $0.0006 per minute of audio

Example: 60-minute audio = 60 × $0.00006 = $0.0036 (Flash) or $0.036 (Pro)

Audio pricing is extremely cheap, making Gemini suitable for transcription + summarization workflows.

Video Input

Gemini APIs do not charge separately for video; videos are processed as sequences of images. Frame-by-frame costs apply (image pricing).

Workaround: Extract key frames, then charge per frame only.


Comparison to OpenAI & Anthropic

Input Token Pricing

ModelInput Cost
Gemini 2.5 Flash$0.30 per 1M
OpenAI GPT-5 Mini$0.25 per 1M
Claude Sonnet 4.6$3.00 per 1M
OpenAI GPT-4.1$2.00 per 1M
Gemini 2.5 Pro$1.25 per 1M
Claude Opus 4.6$5.00 per 1M
OpenAI GPT-5.4$2.50 per 1M

GPT-5 Mini is cheapest on input tokens at $0.25/1M; Gemini 2.5 Flash is $0.30/1M. OpenAI's newer models are cheaper than older Claude versions.

Output Token Pricing

ModelOutput Cost
Gemini 2.5 Flash$2.50 per 1M
OpenAI GPT-5 Mini$2.00 per 1M
Claude Sonnet 4.6$15.00 per 1M
OpenAI GPT-4.1$8.00 per 1M
Gemini 2.5 Pro$10.00 per 1M
Claude Opus 4.6$25.00 per 1M
OpenAI GPT-5.4$15.00 per 1M

Claude Opus is most expensive; GPT-5 Mini is cheapest on output.

Cost-Effectiveness for Common Tasks

Chatbot (short responses, high volume):

  • Gemini 2.5 Flash: $0.30 + $2.50 = $2.80 per 1M tokens (blended)
  • Winner: OpenAI GPT-5 Mini ($0.25 + $2.00 = $2.25 blended) for raw cost; Gemini Flash competitive

Document Summarization (long inputs, medium outputs):

  • Gemini 2.5 Flash: Input-heavy, very cheap
  • Winner: Gemini 2.5 Flash (context cache provides additional 90% savings on reused docs)

Code Generation (long outputs):

  • OpenAI GPT-5: $1.25 + $10 = $11.25 per 1M tokens (blended)
  • Gemini 2.5 Pro: $1.25 + $10 = $11.25 per 1M tokens (blended, under 200K context)
  • Winner: Comparable; Gemini 2.5 Pro has 1M context advantage

Complex Reasoning (competitive exams, novel problems):

  • Claude Opus 4.6: $5 + $25 = $30 per 1M tokens (blended)
  • OpenAI o3: $2 + $8 = $10 per 1M tokens (blended)
  • Winner: OpenAI o3 (specialized reasoning)

Cost Optimization Tips

1. Choose the Right Model Tier

Use Gemini 2.5 Flash if:

  • High volume of requests (chatbots, support automation)
  • Output length is moderate
  • Reasoning complexity is low-to-moderate
  • Budget is primary constraint

Use Gemini 2.5 Pro if:

  • Output quality is non-negotiable
  • Longer reasoning required
  • Complex multi-step problems
  • Acceptable if budget is higher

2. Use Context Cache

Store frequently-accessed documents (policies, code, docs) in Prompt Cache. Reuse across 1,000+ API calls to achieve 90% savings on context tokens.

Implementation:

  • Identify stable, reused contexts (company handbook, codebase)
  • Load once into cache
  • Append query tokens for each request

3. Batch Processing (Coming 2026)

Submit non-urgent requests (analysis, reports) to Gemini batch API for 50% discount. Trade latency (up to 24 hours) for cost savings.

4. Compress Input Tokens

Use prompt compression techniques:

  • Remove redundant instructions
  • Use examples instead of lengthy explanations
  • Summarize long documents before sending to API

Example: 100k-token document summary = 10k tokens to API, preserving 90% of information. Cost reduction: 90%.

5. Filter Outputs

Request only necessary data:

  • JSON-structured responses (remove verbose explanations)
  • Bullet points instead of paragraphs
  • Summaries instead of full text

Example: "Return 3 bullet points" vs. "Write an essay" can reduce output tokens by 50-80%.

6. Use Streaming APIs

Gemini API supports streaming responses. Calculate cost as developers stream, stop early if sufficient data received.

Benefit: Stop after receiving 1,000 tokens instead of waiting for full 5,000-token response.


Real-World Cost Examples

Example 1: Customer Support Chatbot

Workload: 10,000 customer conversations per month. Average 200 input tokens, 150 output tokens per conversation.

Using Gemini 2.5 Flash:

  • Input: (10,000 × 200) / 1,000,000 × $0.30 = $0.60
  • Output: (10,000 × 150) / 1,000,000 × $2.50 = $3.75
  • Monthly cost: $4.35

Comparison:

  • OpenAI GPT-5 Mini: (10,000 × 200 × $0.25 + 10,000 × 150 × $2.00) / 1M = $3.50/month
  • Claude Sonnet 4.6: (10,000 × 200 × $3 + 10,000 × 150 × $15) / 1M = $28.50/month

Winner: GPT-5 Mini at $3.50/month is marginally cheaper; Gemini 2.5 Flash at $4.35/month is also very cost-effective. Both are far cheaper than Claude Sonnet.

Workload: Summarize 100 contracts per month. Average 200k tokens per contract (input), 5k tokens per summary (output).

Using Gemini 2.5 Flash with Context Cache:

  • First contract: 200k × $0.30 + 5k × $2.50 = $60 + $12.50 = $72.50
  • Cache setup cost: ~$30 (one-time)
  • Subsequent contracts (reusing structure, cache at 10%): 200k × $0.03 + 5k × $2.50 = $6 + $12.50 = $18.50 each
  • Monthly cost: $72.50 + 99 × $18.50 = $1,904

Without cache:

  • Monthly cost: 100 × $72.50 = $7,250

Savings from context cache: 74%

Example 3: Real-Time Coding Assistance

Workload: 1,000 developer sessions per month. Average 50k token codebase (context), 500 input tokens (question), 1,000 output tokens (code suggestion).

Using Gemini 2.5 Flash with Context Cache:

  • Codebase loaded once: 50k × $0.03 (cache hit rate, 10% of $0.30/M) = $1.50
  • Per request: 500 × $0.30 + 1,000 × $2.50 = $0.15 + $2.50 = $2.65 (including 50k cached tokens at cache rate)
  • Per request more precisely: (500/1M × $0.30) + (50k/1M × $0.03) + (1000/1M × $2.50) = $0.00015 + $0.0015 + $0.0025 = $0.00415
  • Monthly: $1.50 + 1,000 × $0.00415 = $5.65

Using Gemini 2.5 Pro (without cache optimization):

  • Per request: 50k × $1.25 + 500 × $1.25 + 1,000 × $10 = $62.50 + $0.625 + $10 = $73.125
  • Monthly: 1,000 × $73.125 = $73,125

Cost difference: Flash with cache (~$6) vs. Pro without cache ($73,125) = ~12,000x cheaper

Example 4: Batch Data Analysis

Workload: Analyze 10,000 CSV rows per month. Average 2k input tokens per row, 500 output tokens per row. Non-urgent (24-hour SLA acceptable).

Using Gemini batch processing (estimated 50% discount):

  • Input: (10,000 × 2,000) / 1M × $0.30 × 0.5 = $3.00
  • Output: (10,000 × 500) / 1M × $2.50 × 0.5 = $6.25
  • Monthly: $9.25

Using on-demand pricing:

  • Monthly: $18.50

Savings from batch processing: 50%


FAQ

Q: Does Gemini API have a free tier? Yes. Google AI Studio (AI.google.com) offers free access to Gemini 2.5 Flash with limits: 1M tokens/day, 60 RPM. Suitable for exploration, not production.

Q: What's the difference between Gemini and Gemini Pro? Gemini 2.5 is the latest version. Gemini 1.5 is older. Pro and Flash are size variants; Pro is larger and more capable, Flash is faster and cheaper.

Q: Can I use cached contexts indefinitely? No. Prompt Cache expires after 5 minutes of inactivity per cache. Reusing within 5 minutes costs 10% of normal price. After 5 minutes, reload context (costs 100%).

Q: Does output token count include the user's prompt? No. Output tokens are only the model's response. Input tokens include your prompt and all context.

Q: Is there a monthly bill minimum? No. Google Cloud billing has no minimum. If you use $0.50 in a month, you're charged $0.50 (after free credits).

Q: How does Gemini compare to Claude for coding tasks? Gemini 2.5 Flash is competitive for simple coding (bug fixes, boilerplate). Claude Opus is better for complex refactoring and architecture design. For cost, Gemini 2.5 Flash wins (100x cheaper). For quality, Claude Opus wins.

Q: Does Gemini API work in my region? Gemini API is available in 150+ countries. Check Google Cloud regional availability. No geo-restrictions on API access itself.

Q: Can I use Gemini API for fine-tuning? Not yet. Google does not offer fine-tuning on Gemini API as of March 2026. Use base models only.

Q: How do I estimate my monthly bill? Estimate input tokens × input cost per 1M + output tokens × output cost per 1M. Multiply by expected volume per month. Use Google Cloud cost calculator for accuracy.



Sources

  • Google AI Studio. google.ai/studio/ (March 2026)
  • Google Cloud Vertex AI Pricing. cloud.google.com/vertex-ai/pricing (March 2026)
  • Gemini API Documentation. AI.google.dev/docs (March 2026)
  • Gemini 2.5 Announcement. blog.google/technology/ai/google-gemini-2-5/ (December 2024)
  • Google Cloud Prompt Caching. cloud.google.com/docs/generative-ai/caching (Beta, March 2026)
  • OpenAI Pricing. openai.com/api/pricing (March 2026)
  • Anthropic Pricing. anthropic.com/pricing (March 2026)