DeepSeek API Pricing 2026: Model Costs, Discounts, and Cost Scenarios

Deepseek Api Pricing: DeepSeek Pricing Overview
Model Pricing Breakdown
V3 Standard Pricing
R1 Reasoning Model
Off-Peak Discounts
Context Caching
Batching and Queue Processing
Cost Comparisons
Rate Limits and Quotas
Monthly Cost Projections
Use Case Cost Scenarios
FAQ
Related Resources
Sources

Deepseek Api Pricing: DeepSeek Pricing Overview

DeepSeek API pricing covers two primary models: V3 (general-purpose) and R1 (reasoning). V3 is the fastest and cheapest. R1 is slower but handles complex reasoning, code generation, and multi-step problem solving.

Off-peak discounts (75% for R1, 50% for V3) apply during off-peak hours (16:30-00:30 GMT). Context caching (90% discount on cached inputs) enables bulk processing at minimal cost. Rate limits allow 30 requests/minute and 100K tokens/minute on Pro tier.

All prices are in USD. As of March 2026.

Model	Prompt $/MTok	Completion $/MTok	Use Case	Off-Peak Price
DeepSeek V3	$0.14	$0.28	General chat, fast inference	$0.07 / $0.14
DeepSeek V3.1	$0.27	$1.10	General chat, improved quality	$0.135 / $0.55
DeepSeek R1	$0.55	$2.19	Reasoning, code, math	$0.14 / $0.55
Context Caching (90% off)	$0.014	$0.28	Bulk document analysis	$0.0014 / $0.028
V3.2 (unified model)	$0.28	$0.42	Chat + reasoning combined	$0.14 / $0.21

MTok = million tokens. Prices pulled from api-docs.deepseek.com, March 2026.

Model Pricing Breakdown

DeepSeek offers three primary model tiers: V3 (standard), R1 (reasoning), and the newer V3.2 (unified model combining both).

V3 (Standard Model)

General-purpose conversation, coding, summarization, and instruction-following. Optimized for speed and cost. No explicit reasoning or chain-of-thought support (it emerges implicitly through better weights). 128K context window, 8K max output.

R1 (Reasoning Model)

Designed for complex reasoning, math, coding, and logic. Internally generates reasoning tokens (not visible to user). Slower than V3 but produces more accurate results on challenging tasks. 128K context, 8K max output. Costs 4x more than V3 on input, 7.8x on output.

V3.2 (Unified Model)

Released in early 2026, V3.2 unifies V3 and R1 into a single model at $0.28/$0.42 per million tokens. This replaces separate V3 and R1 pricing. V3.2 handles both chat and reasoning tasks in a single request, reducing complexity for developers who previously had to route requests to different models.

All models support:

Context window: 128K tokens
Max output: 8K tokens (standard limit)
Streaming: yes
Vision: (image input support status unclear)

V3 Standard Pricing

DeepSeek V3 costs $0.14 per million prompt tokens and $0.28 per million completion tokens.

Example Costs:

Chat request (200 prompt tokens, 100 completion tokens):

Prompt cost: 200 / 1,000,000 × $0.14 = $0.000028
Completion cost: 100 / 1,000,000 × $0.28 = $0.000028
Total: $0.000056 (~0.06 cents)

Summarize a 10K-token document (10,000 prompt, 500 completion):

Prompt cost: 10,000 / 1,000,000 × $0.14 = $0.0014
Completion cost: 500 / 1,000,000 × $0.28 = $0.00014
Total: $0.00154 (~0.15 cents)

Batch analysis (1M documents × 500 tokens each = 500M tokens):

Prompt cost: 500M / 1,000,000 × $0.14 = $70
Completion cost: 50M / 1,000,000 × $0.28 = $14
Total: $84

V3 is the cheapest general-purpose model available (March 2026). OpenAI GPT-4o Mini costs $0.15/$0.60 (slightly higher completion cost). Claude 3 Haiku costs $1.00/$5.00 (5-10x more expensive). For pure cost minimization, V3 is unmatched among major models.

R1 Reasoning Model

DeepSeek R1 costs $0.55 per million prompt tokens and $2.19 per million completion tokens. 4x the prompt cost of V3, 7.8x the completion cost.

When the price premium is worth it:

R1 excels on benchmarks where explicit reasoning is critical:

Math (AIME, competition-level): R1 scores higher than GPT-4o
Code (LeetCode-hard problems): R1 solves harder problems than V3
Logic puzzles and multi-step reasoning: R1 > V3 by measurable margin
Writing and analysis: R1's reasoning translates to higher-quality output

Example Costs:

One reasoning request (500 prompt tokens, 2,000 completion tokens with reasoning):

Prompt cost: 500 / 1,000,000 × $0.55 = $0.000275
Completion cost: 2,000 / 1,000,000 × $2.19 = $0.00438
Total: $0.004655 (~0.5 cents)

Versus V3 for same tokens:

Prompt cost: 500 / 1,000,000 × $0.14 = $0.00007
Completion cost: 2,000 / 1,000,000 × $0.28 = $0.00056
Total: $0.00063

R1 costs 7.3x more per request for complex reasoning. Cost is justified if R1's output eliminates manual review or fixes code that V3 gets wrong.

Batch reasoning jobs (100 complex coding problems, 1,000 tokens each = 100K prompt, 200K completion):

R1 cost: (100K / 1M) × $0.55 + (200K / 1M) × $2.19 = $0.0495
V3 cost: (100K / 1M) × $0.14 + (200K / 1M) × $0.28 = $0.0084

Wait. R1 is 5.9x more expensive on this workload. However, R1's stronger reasoning often means fewer total output tokens (more direct solutions), so total cost can be similar despite higher rates.

Off-Peak Discounts

DeepSeek offers significant discounts during off-peak hours: 16:30 to 00:30 GMT (approximately 12:30 PM - 8:30 PM US Eastern, 9:30 AM - 5:30 PM US Pacific).

Off-Peak Pricing:

V3: $0.07 prompt / $0.14 completion (50% discount)
R1: $0.14 prompt / $0.55 completion (75% discount)
V3.2: $0.14 prompt / $0.21 completion (50% discount)

This creates a new category: scheduled batch processing.

Example: Batch Processing at Off-Peak

Process 500M tokens of customer documents during off-peak (scheduled batch job at 2 AM EST):

V3 standard: $84
V3 off-peak: $42 (50% savings)

For a team running batch jobs nightly, off-peak pricing saves $10,000+/month on large-scale analysis. The math: 500M tokens × 30 days × 50% savings = $1,260/month saved. At large-scale (10B tokens/month batch), savings exceed $30,000/month.

This incentivizes teams to separate realtime (peak hours) from batch (off-peak) workloads.

Context Caching

DeepSeek charges $0.014 per million cached input tokens (90% discount vs $0.14 standard for V3). The cache works across multiple requests: the first request with cached tokens pays full price, subsequent requests using the same cached tokens pay 10% of input cost.

Minimum cache block: 1,024 tokens.

Use Case: Multi-Turn Conversation

Conversation about a 100K-token document:

Turn 1 (user uploads doc, asks question):
Prompt: 100K document + 100 question = 100.1K tokens
Cached: 100K document tokens (future turns reuse)
Cost: (100K × $0.014) + (100 × $0.14) + (500 completion × $0.28) = $1.40 + $0.014 + $0.14 = $1.554
Turn 2-10 (user asks follow-ups):
Prompt: 100K cached + 50 new tokens
Cache hit cost (cached tokens): $0 (already paid)
New token cost: 50 × $0.14 = $0.007
Completion: 200 × $0.28 = $0.056
Cost per turn: $0.063

10-turn conversation: $1.554 (turn 1) + $0.063 × 9 = $1.554 + $0.567 = $2.121

Without caching: (100.1K × 10 + 50 × 9) tokens × $0.14 average = ~$14.50

Caching saves 85% ($12.38 of $14.50).

Example: Bulk Document Analysis

Analyze 1,000 documents using a fixed system prompt (3K tokens reused).

Option 1 (no caching):

Each document: 3K prompt (system) + 5K document tokens + 500 completion
Cost per doc: (8.5K × $0.14) + (500 × $0.28) = $1.39
1,000 docs: $1,390

Option 2 (caching system prompt):

Initial: Cache 3K system prompt (pay $0.042)
Each document: 3K cached + 5K doc + 500 completion
Cost per doc: $0 (cached) + (5K × $0.14) + (500 × $0.28) = $0.70 + $0.14 = $0.84
1,000 docs: $0.042 + (0.84 × 1,000) = $840

Savings: $550 (40% reduction).

For large-scale batch jobs, caching is non-negotiable. Set up once, save thousands.

Batching and Queue Processing

DeepSeek supports asynchronous batching for non-realtime work. Batch processing allows queuing up requests and processing them during off-peak hours, applying both the off-peak discount and batch processing efficiency.

Batch Processing with Off-Peak Timing:

Queue 100M tokens for processing:

Standard on-demand: $14
Off-peak standard: $7 (50% off)
Batch processing during off-peak: $3.50 (additional 50% discount applied)

This dual-discount strategy (off-peak + batch) gives teams up to 75% savings on routine processing.

Cost Comparisons

V3 vs Competitors (General Purpose)

Model	Prompt $/MTok	Completion $/MTok	Notes
DeepSeek V3	$0.14	$0.28	Cheapest general-purpose
OpenAI GPT-4o Mini	$0.15	$0.60	Similar to V3, higher output
Claude Haiku 4.5	$1.00	$5.00	Higher cost, better reasoning
GPT-4.1 Nano	$0.10	$0.40	Cheaper prompt, higher output
Mistral Small	$0.14	$0.42	Same prompt cost, higher completion

V3 is competitive on cost. GPT-4o Mini ($0.15) is nearly identical on prompt, but completion is 2.1x more expensive. Claude 3 Haiku is 7-18x more expensive. For pure cost minimization on high-volume tasks, V3 or GPT-4.1 Nano.

R1 vs Reasoning Competitors

Model	Prompt $/MTok	Completion $/MTok	Notes
DeepSeek R1	$0.55	$2.19	Cheapest reasoning model
Claude Opus 4.6	$5.00	$25.00	9x more expensive
o3 Mini	$1.10	$4.40	2x more expensive
o3	$2.00	$8.00	3.6x more expensive
o1-Mini	$3.00	$12.00	5.5x more expensive

R1 is the cheapest reasoning model available (March 2026). o3 Mini is 2x more expensive. Claude Opus is 10x more expensive. For reasoning-heavy workloads, R1 dominates on cost.

Rate Limits and Quotas

DeepSeek enforces rate limits on API access. As of March 2026, limits are:

Tier	Requests/Minute	Tokens/Minute	Notes
Free	1	1,000	No cost tier; limited trial
Pro	30	100,000	Pay-as-you-go tier (standard)
Business	100	500,000	Production tier

Most teams use the Pro tier (pay-as-you-go). Rate limits rarely hit for standard use cases.

For batch processing:

1M tokens takes ~10 minutes at 100K tokens/min
Cost: (1M / 1,000,000) × $0.14 = $0.14
Time: ~10 minutes (serial processing)

Parallel requests multiply throughput: 30 concurrent requests × 100K tokens = 3M tokens/minute total.

No monthly quota (unlike OpenAI's usage-based limits). Only rate limits. Bill increases with usage, but no account suspension at a specific dollar threshold.

Monthly Cost Projections

Low-Volume User (Hobby):

Usage: 10M tokens prompt, 5M tokens completion per month
Cost: (10M / 1M) × $0.14 + (5M / 1M) × $0.28 = $1.40 + $1.40 = $2.80/month

Moderate User (Small Team):

Usage: 100M tokens prompt (customer service, chat), 50M completion per month
Cost: (100M / 1M) × $0.14 + (50M / 1M) × $0.28 = $14 + $14 = $28/month
Plus off-peak batch jobs (50% discount): +$7/month
Total: ~$35/month

Heavy User (Production Inference):

Usage: 1B tokens prompt (1,000 concurrent users, 8 hours/day), 500M completion
Cost: (1B / 1M) × $0.14 + (500M / 1M) × $0.28 = $140 + $140 = $280/month
Off-peak batch: 500M tokens × 50% = +$35/month
Caching (20% of volume): -$0.014 × 200M / 1M = -$2.80/month
Total: ~$312/month (vs GPU inference: same scale might cost $500-1,000/month on H100)

large-scale Scale:

Usage: 10B tokens/month (1M+ monthly requests)
On-demand cost: $1,400/month
With off-peak batching (50% of volume): $(1,400 × 0.5) = $700 savings
With caching (40% of volume): $(1,400 × 0.40 × 0.9) = $504 savings
Total: ~$196/month after optimizations

Use Case Cost Scenarios

Customer Support Chatbots

V3 for straightforward queries (billing, hours, FAQ). Cost: $0.42/1M tokens (prompt + completion average).

Handles 1M support tickets/month at:

DeepSeek: ~$420/month
Anthropic Claude Haiku: ~$1,500/month
OpenAI GPT-4o Mini: ~$900/month

DeepSeek saves $1,080/month vs Claude, $480/month vs OpenAI.

Batch Document Analysis

R1 with caching. Process 1M documents:

Cached system prompt (5K tokens): $0.014 × 5K / 1M = $0.00007
Per-document: 10K tokens prompt + 500 completion = (10K × $0.55) + (500 × $2.19) / 1M = $0.0055 + $0.0011 = $0.0066
Total: $0.00007 + ($0.0066 × 1M) = $6,600

OpenAI equivalent (o3 Mini): $1.10 / 500 = $0.0022 per token, same scale: ~$22,000. DeepSeek saves $15,400.

Code Generation (LeetCode-Hard)

R1 is cheaper than o3 ($0.55 vs $2.00 prompt) while achieving similar accuracy. Choose R1 for cost-constrained teams building coding assistants.

100 complex coding problems:

R1: (100K prompt × $0.55 + 200K output × $2.19) / 1M = $0.0495
o3 Mini: (100K × $1.10 + 200K × $4.40) / 1M = $0.11
Savings: $0.061 per 100 problems, or ~$610 per 10K problems

Off-Peak Batch Processing

Schedule all inference during 16:30-00:30 GMT. 75% off R1 reasoning. Process 100M tokens for:

Standard: $14
Off-peak: $3.50
Savings: $10.50 per 100M tokens

At 1B tokens/month: $105/month saved. At 10B tokens/month: $1,050/month saved.

FAQ

Is DeepSeek cheaper than OpenAI?

For general-purpose (V3 vs GPT-4o Mini): V3 is $0.14/$0.28, GPT-4o Mini is $0.15/$0.60. V3 prompt is cheaper, completion is half the price. DeepSeek wins at scale.

For reasoning (R1 vs o3 Mini): DeepSeek R1 is $0.55/$2.19, o3 Mini is $1.10/$4.40. DeepSeek is 50% cheaper while achieving comparable accuracy on benchmarks.

Can I use DeepSeek offline?

No. API-only. No local models or downloads. Requires internet connection.

What's the caching hit rate?

Depends on use case. Multi-turn conversations: 80%+ of tokens cached (after first turn). Batch jobs with fixed prompts: 50-70% cache hit rate. Check dashboard for your actual rate.

Is off-peak timing worth it?

Yes if processing bulk data. Batch jobs scheduled during off-peak save 50-75%. For interactive apps, off-peak timing doesn't apply (users query any time). Set up background workers to process overnight.

Will DeepSeek pricing stay this low?

Unknown. Prices are currently promotional (2026 launch pricing). Monitor for updates. If budget-dependent, lock in annual commitments if available.

Does DeepSeek offer large-scale contracts?

Check api-docs.deepseek.com for large-scale sales contact.

Can I mix V3 and R1 in the same application?

Yes. Route easy queries to V3 (cheap), hard queries to R1 (accurate). This hybrid approach is cost-optimal. Set up cost tracking to find the right threshold.

How does V3.2 pricing compare to separate V3/R1?

V3.2 at $0.28/$0.42 splits the difference. For pure chat, V3 is cheaper ($0.14/$0.28). For pure reasoning, R1 is cheaper ($0.55/$2.19). V3.2 is best if you need both in one request.

Contents