DeepSeek API Pricing 2026: Model Costs, Discounts, and Cost Scenarios

Deploybase · February 11, 2026 · LLM Pricing

Contents

Deepseek Api Pricing: DeepSeek Pricing Overview

DeepSeek API pricing covers two primary models: V3 (general-purpose) and R1 (reasoning). V3 is the fastest and cheapest. R1 is slower but handles complex reasoning, code generation, and multi-step problem solving.

Off-peak discounts (75% for R1, 50% for V3) apply during off-peak hours (16:30-00:30 GMT). Context caching (90% discount on cached inputs) enables bulk processing at minimal cost. Rate limits allow 30 requests/minute and 100K tokens/minute on Pro tier.

All prices are in USD. As of March 2026.

ModelPrompt $/MTokCompletion $/MTokUse CaseOff-Peak Price
DeepSeek V3$0.14$0.28General chat, fast inference$0.07 / $0.14
DeepSeek V3.1$0.27$1.10General chat, improved quality$0.135 / $0.55
DeepSeek R1$0.55$2.19Reasoning, code, math$0.14 / $0.55
Context Caching (90% off)$0.014$0.28Bulk document analysis$0.0014 / $0.028
V3.2 (unified model)$0.28$0.42Chat + reasoning combined$0.14 / $0.21

MTok = million tokens. Prices pulled from api-docs.deepseek.com, March 2026.


Model Pricing Breakdown

DeepSeek offers three primary model tiers: V3 (standard), R1 (reasoning), and the newer V3.2 (unified model combining both).

V3 (Standard Model)

General-purpose conversation, coding, summarization, and instruction-following. Optimized for speed and cost. No explicit reasoning or chain-of-thought support (it emerges implicitly through better weights). 128K context window, 8K max output.

R1 (Reasoning Model)

Designed for complex reasoning, math, coding, and logic. Internally generates reasoning tokens (not visible to user). Slower than V3 but produces more accurate results on challenging tasks. 128K context, 8K max output. Costs 4x more than V3 on input, 7.8x on output.

V3.2 (Unified Model)

Released in early 2026, V3.2 unifies V3 and R1 into a single model at $0.28/$0.42 per million tokens. This replaces separate V3 and R1 pricing. V3.2 handles both chat and reasoning tasks in a single request, reducing complexity for developers who previously had to route requests to different models.

All models support:

  • Context window: 128K tokens
  • Max output: 8K tokens (standard limit)
  • Streaming: yes
  • Vision: (image input support status unclear)

V3 Standard Pricing

DeepSeek V3 costs $0.14 per million prompt tokens and $0.28 per million completion tokens.

Example Costs:

Chat request (200 prompt tokens, 100 completion tokens):

  • Prompt cost: 200 / 1,000,000 × $0.14 = $0.000028
  • Completion cost: 100 / 1,000,000 × $0.28 = $0.000028
  • Total: $0.000056 (~0.06 cents)

Summarize a 10K-token document (10,000 prompt, 500 completion):

  • Prompt cost: 10,000 / 1,000,000 × $0.14 = $0.0014
  • Completion cost: 500 / 1,000,000 × $0.28 = $0.00014
  • Total: $0.00154 (~0.15 cents)

Batch analysis (1M documents × 500 tokens each = 500M tokens):

  • Prompt cost: 500M / 1,000,000 × $0.14 = $70
  • Completion cost: 50M / 1,000,000 × $0.28 = $14
  • Total: $84

V3 is the cheapest general-purpose model available (March 2026). OpenAI GPT-4o Mini costs $0.15/$0.60 (slightly higher completion cost). Claude 3 Haiku costs $1.00/$5.00 (5-10x more expensive). For pure cost minimization, V3 is unmatched among major models.


R1 Reasoning Model

DeepSeek R1 costs $0.55 per million prompt tokens and $2.19 per million completion tokens. 4x the prompt cost of V3, 7.8x the completion cost.

When the price premium is worth it:

R1 excels on benchmarks where explicit reasoning is critical:

  • Math (AIME, competition-level): R1 scores higher than GPT-4o
  • Code (LeetCode-hard problems): R1 solves harder problems than V3
  • Logic puzzles and multi-step reasoning: R1 > V3 by measurable margin
  • Writing and analysis: R1's reasoning translates to higher-quality output

Example Costs:

One reasoning request (500 prompt tokens, 2,000 completion tokens with reasoning):

  • Prompt cost: 500 / 1,000,000 × $0.55 = $0.000275
  • Completion cost: 2,000 / 1,000,000 × $2.19 = $0.00438
  • Total: $0.004655 (~0.5 cents)

Versus V3 for same tokens:

  • Prompt cost: 500 / 1,000,000 × $0.14 = $0.00007
  • Completion cost: 2,000 / 1,000,000 × $0.28 = $0.00056
  • Total: $0.00063

R1 costs 7.3x more per request for complex reasoning. Cost is justified if R1's output eliminates manual review or fixes code that V3 gets wrong.

Batch reasoning jobs (100 complex coding problems, 1,000 tokens each = 100K prompt, 200K completion):

  • R1 cost: (100K / 1M) × $0.55 + (200K / 1M) × $2.19 = $0.0495
  • V3 cost: (100K / 1M) × $0.14 + (200K / 1M) × $0.28 = $0.0084

Wait. R1 is 5.9x more expensive on this workload. However, R1's stronger reasoning often means fewer total output tokens (more direct solutions), so total cost can be similar despite higher rates.


Off-Peak Discounts

DeepSeek offers significant discounts during off-peak hours: 16:30 to 00:30 GMT (approximately 12:30 PM - 8:30 PM US Eastern, 9:30 AM - 5:30 PM US Pacific).

Off-Peak Pricing:

  • V3: $0.07 prompt / $0.14 completion (50% discount)
  • R1: $0.14 prompt / $0.55 completion (75% discount)
  • V3.2: $0.14 prompt / $0.21 completion (50% discount)

This creates a new category: scheduled batch processing.

Example: Batch Processing at Off-Peak

Process 500M tokens of customer documents during off-peak (scheduled batch job at 2 AM EST):

  • V3 standard: $84
  • V3 off-peak: $42 (50% savings)

For a team running batch jobs nightly, off-peak pricing saves $10,000+/month on large-scale analysis. The math: 500M tokens × 30 days × 50% savings = $1,260/month saved. At large-scale (10B tokens/month batch), savings exceed $30,000/month.

This incentivizes teams to separate realtime (peak hours) from batch (off-peak) workloads.


Context Caching

DeepSeek charges $0.014 per million cached input tokens (90% discount vs $0.14 standard for V3). The cache works across multiple requests: the first request with cached tokens pays full price, subsequent requests using the same cached tokens pay 10% of input cost.

Minimum cache block: 1,024 tokens.

Use Case: Multi-Turn Conversation

Conversation about a 100K-token document:

  • Turn 1 (user uploads doc, asks question):

  • Prompt: 100K document + 100 question = 100.1K tokens

  • Cached: 100K document tokens (future turns reuse)

  • Cost: (100K × $0.014) + (100 × $0.14) + (500 completion × $0.28) = $1.40 + $0.014 + $0.14 = $1.554

  • Turn 2-10 (user asks follow-ups):

  • Prompt: 100K cached + 50 new tokens

  • Cache hit cost (cached tokens): $0 (already paid)

  • New token cost: 50 × $0.14 = $0.007

  • Completion: 200 × $0.28 = $0.056

  • Cost per turn: $0.063

10-turn conversation: $1.554 (turn 1) + $0.063 × 9 = $1.554 + $0.567 = $2.121

Without caching: (100.1K × 10 + 50 × 9) tokens × $0.14 average = ~$14.50

Caching saves 85% ($12.38 of $14.50).

Example: Bulk Document Analysis

Analyze 1,000 documents using a fixed system prompt (3K tokens reused).

Option 1 (no caching):

  • Each document: 3K prompt (system) + 5K document tokens + 500 completion
  • Cost per doc: (8.5K × $0.14) + (500 × $0.28) = $1.39
  • 1,000 docs: $1,390

Option 2 (caching system prompt):

  • Initial: Cache 3K system prompt (pay $0.042)
  • Each document: 3K cached + 5K doc + 500 completion
  • Cost per doc: $0 (cached) + (5K × $0.14) + (500 × $0.28) = $0.70 + $0.14 = $0.84
  • 1,000 docs: $0.042 + (0.84 × 1,000) = $840

Savings: $550 (40% reduction).

For large-scale batch jobs, caching is non-negotiable. Set up once, save thousands.


Batching and Queue Processing

DeepSeek supports asynchronous batching for non-realtime work. Batch processing allows queuing up requests and processing them during off-peak hours, applying both the off-peak discount and batch processing efficiency.

Batch Processing with Off-Peak Timing:

Queue 100M tokens for processing:

  • Standard on-demand: $14
  • Off-peak standard: $7 (50% off)
  • Batch processing during off-peak: $3.50 (additional 50% discount applied)

This dual-discount strategy (off-peak + batch) gives teams up to 75% savings on routine processing.


Cost Comparisons

V3 vs Competitors (General Purpose)

ModelPrompt $/MTokCompletion $/MTokNotes
DeepSeek V3$0.14$0.28Cheapest general-purpose
OpenAI GPT-4o Mini$0.15$0.60Similar to V3, higher output
Claude Haiku 4.5$1.00$5.00Higher cost, better reasoning
GPT-4.1 Nano$0.10$0.40Cheaper prompt, higher output
Mistral Small$0.14$0.42Same prompt cost, higher completion

V3 is competitive on cost. GPT-4o Mini ($0.15) is nearly identical on prompt, but completion is 2.1x more expensive. Claude 3 Haiku is 7-18x more expensive. For pure cost minimization on high-volume tasks, V3 or GPT-4.1 Nano.

R1 vs Reasoning Competitors

ModelPrompt $/MTokCompletion $/MTokNotes
DeepSeek R1$0.55$2.19Cheapest reasoning model
Claude Opus 4.6$5.00$25.009x more expensive
o3 Mini$1.10$4.402x more expensive
o3$2.00$8.003.6x more expensive
o1-Mini$3.00$12.005.5x more expensive

R1 is the cheapest reasoning model available (March 2026). o3 Mini is 2x more expensive. Claude Opus is 10x more expensive. For reasoning-heavy workloads, R1 dominates on cost.


Rate Limits and Quotas

DeepSeek enforces rate limits on API access. As of March 2026, limits are:

TierRequests/MinuteTokens/MinuteNotes
Free11,000No cost tier; limited trial
Pro30100,000Pay-as-teams-go tier (standard)
Business100500,000Production tier

Most teams use the Pro tier (pay-as-teams-go). Rate limits rarely hit for standard use cases.

For batch processing:

  • 1M tokens takes ~10 minutes at 100K tokens/min
  • Cost: (1M / 1,000,000) × $0.14 = $0.14
  • Time: ~10 minutes (serial processing)

Parallel requests multiply throughput: 30 concurrent requests × 100K tokens = 3M tokens/minute total.

No monthly quota (unlike OpenAI's usage-based limits). Only rate limits. Bill increases with usage, but no account suspension at a specific dollar threshold.


Monthly Cost Projections

Low-Volume User (Hobby):

  • Usage: 10M tokens prompt, 5M tokens completion per month
  • Cost: (10M / 1M) × $0.14 + (5M / 1M) × $0.28 = $1.40 + $1.40 = $2.80/month

Moderate User (Small Team):

  • Usage: 100M tokens prompt (customer service, chat), 50M completion per month
  • Cost: (100M / 1M) × $0.14 + (50M / 1M) × $0.28 = $14 + $14 = $28/month
  • Plus off-peak batch jobs (50% discount): +$7/month
  • Total: ~$35/month

Heavy User (Production Inference):

  • Usage: 1B tokens prompt (1,000 concurrent users, 8 hours/day), 500M completion
  • Cost: (1B / 1M) × $0.14 + (500M / 1M) × $0.28 = $140 + $140 = $280/month
  • Off-peak batch: 500M tokens × 50% = +$35/month
  • Caching (20% of volume): -$0.014 × 200M / 1M = -$2.80/month
  • Total: ~$312/month (vs GPU inference: same scale might cost $500-1,000/month on H100)

large-scale Scale:

  • Usage: 10B tokens/month (1M+ monthly requests)
  • On-demand cost: $1,400/month
  • With off-peak batching (50% of volume): $(1,400 × 0.5) = $700 savings
  • With caching (40% of volume): $(1,400 × 0.40 × 0.9) = $504 savings
  • Total: ~$196/month after optimizations

Use Case Cost Scenarios

Customer Support Chatbots

V3 for straightforward queries (billing, hours, FAQ). Cost: $0.42/1M tokens (prompt + completion average).

Handles 1M support tickets/month at:

  • DeepSeek: ~$420/month
  • Anthropic Claude Haiku: ~$1,500/month
  • OpenAI GPT-4o Mini: ~$900/month

DeepSeek saves $1,080/month vs Claude, $480/month vs OpenAI.

Batch Document Analysis

R1 with caching. Process 1M documents:

  • Cached system prompt (5K tokens): $0.014 × 5K / 1M = $0.00007
  • Per-document: 10K tokens prompt + 500 completion = (10K × $0.55) + (500 × $2.19) / 1M = $0.0055 + $0.0011 = $0.0066
  • Total: $0.00007 + ($0.0066 × 1M) = $6,600

OpenAI equivalent (o3 Mini): $1.10 / 500 = $0.0022 per token, same scale: ~$22,000. DeepSeek saves $15,400.

Code Generation (LeetCode-Hard)

R1 is cheaper than o3 ($0.55 vs $2.00 prompt) while achieving similar accuracy. Choose R1 for cost-constrained teams building coding assistants.

100 complex coding problems:

  • R1: (100K prompt × $0.55 + 200K output × $2.19) / 1M = $0.0495
  • o3 Mini: (100K × $1.10 + 200K × $4.40) / 1M = $0.11
  • Savings: $0.061 per 100 problems, or ~$610 per 10K problems

Off-Peak Batch Processing

Schedule all inference during 16:30-00:30 GMT. 75% off R1 reasoning. Process 100M tokens for:

  • Standard: $14
  • Off-peak: $3.50
  • Savings: $10.50 per 100M tokens

At 1B tokens/month: $105/month saved. At 10B tokens/month: $1,050/month saved.


FAQ

Is DeepSeek cheaper than OpenAI?

For general-purpose (V3 vs GPT-4o Mini): V3 is $0.14/$0.28, GPT-4o Mini is $0.15/$0.60. V3 prompt is cheaper, completion is half the price. DeepSeek wins at scale.

For reasoning (R1 vs o3 Mini): DeepSeek R1 is $0.55/$2.19, o3 Mini is $1.10/$4.40. DeepSeek is 50% cheaper while achieving comparable accuracy on benchmarks.

Can I use DeepSeek offline?

No. API-only. No local models or downloads. Requires internet connection.

What's the caching hit rate?

Depends on use case. Multi-turn conversations: 80%+ of tokens cached (after first turn). Batch jobs with fixed prompts: 50-70% cache hit rate. Check dashboard for your actual rate.

Is off-peak timing worth it?

Yes if processing bulk data. Batch jobs scheduled during off-peak save 50-75%. For interactive apps, off-peak timing doesn't apply (users query any time). Set up background workers to process overnight.

Will DeepSeek pricing stay this low?

Unknown. Prices are currently promotional (2026 launch pricing). Monitor for updates. If budget-dependent, lock in annual commitments if available.

Does DeepSeek offer large-scale contracts?

Check api-docs.deepseek.com for large-scale sales contact.

Can I mix V3 and R1 in the same application?

Yes. Route easy queries to V3 (cheap), hard queries to R1 (accurate). This hybrid approach is cost-optimal. Set up cost tracking to find the right threshold.

How does V3.2 pricing compare to separate V3/R1?

V3.2 at $0.28/$0.42 splits the difference. For pure chat, V3 is cheaper ($0.14/$0.28). For pure reasoning, R1 is cheaper ($0.55/$2.19). V3.2 is best if you need both in one request.



Sources