Contents
- DeepSeek R1 Pricing Overview
- API Pricing Breakdown
- Pricing Tiers and Variants
- Hosting on Cloud Providers
- Cost Comparison with Alternatives
- Real-World Cost Scenarios
- Off-Peak Pricing and Discounts
- R1 vs V3 Pricing
- Fine-Tuning and Customization
- Hidden Costs and Infrastructure
- FAQ
- Related Resources
- Sources
DeepSeek R1 Pricing Overview
DeepSeek R1 API pricing starts at $0.55 per million input tokens and goes up to $2.19/M for output, with reasoning variants generating internal chain-of-thought tokens that inflate actual consumption. The pricing undercuts most Western LLM providers significantly, but R1 is optimized for reasoning workloads and uses explicit token generation, which masks the true token count.
The headline price looks cheap. The actual cost per reasoning task depends on the ratio of reasoning tokens to output tokens. A task that generates 5K tokens of explicit reasoning before a 500-token answer will consume far more API quota than raw input/output rates suggest.
Full DeepSeek model pricing and specs tracked on DeployBase LLM comparison.
API Pricing Breakdown
Via Official DeepSeek API (api.deepseek.com)
| Model | Context | Input $/M | Output $/M | Notes |
|---|---|---|---|---|
| DeepSeek V3.1 | 128K | $0.27 | $1.10 | Standard inference, no reasoning |
| DeepSeek R1 | 128K | $0.55 | $2.19 | Chain-of-thought reasoning |
Exact rates confirmed via official DeepSeek API docs as of March 2026. Pricing has remained stable since early March launch but teams should verify with the API dashboard before committing to large batches.
Token Consumption Reality
DeepSeek R1 generates internal reasoning tokens that count toward output billing. A single request works like this:
Request:
- User input: 1,000 tokens
- Internal reasoning generated by model: 5,000 tokens
- Final answer: 500 tokens
- Total billable output: 5,500 tokens (reasoning + answer)
Billing: Input: 1,000 × ($0.55/M) = $0.00055 Output: 5,500 × ($2.19/M) = $0.012045 Request cost: $0.013/request
This is why reasoning models appear cheap. The reasoning tokens are hidden in the "output" category. A straightforward inference task on V3 might cost $0.002. The same task on R1 with reasoning might cost $0.010+, depending on reasoning depth.
Monthly Cost Estimates
Processing 1 million input tokens + 100K output tokens (assumed non-reasoning):
DeepSeek V3: ($0.27 × 1M) + ($1.10 × 0.1M) = $0.27 + $0.11 = $0.38/month
DeepSeek R1: ($0.55 × 1M) + ($2.19 × 0.1M) = $0.55 + $0.22 = $0.77/month
But R1 with reasoning overhead (5K reasoning tokens per request, 1,000 requests):
Input: 1M tokens Reasoning tokens: 5,000 × 1,000 = 5M tokens Final answers: 100K tokens Total billable output: 5.1M tokens
Cost: ($0.55 × 1M) + ($2.19 × 5.1M) = $0.55 + $11.169 = $11.719/month
The reasoning overhead is substantial. Teams must understand their specific task's reasoning depth to estimate accurate costs.
Pricing Tiers and Variants
Standard R1 ($0.55/$2.19)
Full reasoning capability. Generates extensive chain-of-thought tokens. Best for problems where thorough reasoning improves accuracy: multi-step math, logic puzzles, debugging, complex analysis. Lower throughput (~10-20 tokens/sec per GPU) because of reasoning overhead.
DeepSeek V3 ($0.27/$1.10)
Standard inference without reasoning. No internal chain-of-thought generation. Higher throughput (~50-100 tokens/sec). Best for classification, extraction, summarization, paraphrasing.
Hosting on Cloud Providers
Via Together.AI (Official Partner)
Together.AI offers DeepSeek R1 inference without hosting the own infrastructure.
| Model | Input $/M | Output $/M |
|---|---|---|
| DeepSeek R1 | $0.50 | $1.50 |
| DeepSeek V3 | $0.25 | $0.90 |
Together.AI charges slightly less than the official API but handles scaling and availability. Good for teams that don't want to manage infrastructure but need DeepSeek inference. Rates updated as of late March 2026 based on xAI partnership announcements.
Via Replicate
Replicate offers DeepSeek models but pricing is structured per second of GPU time rather than per token. Roughly $0.001-0.005 per second, translating to $2-5/minute of inference. For large batches, this becomes uneconomical compared to direct API pricing.
Self-Hosted on Cloud GPUs
Download DeepSeek R1 weights (open-source, MIT licensed) and run on cloud GPU rentals.
Model sizes and requirements:
- DeepSeek R1 distilled (e.g., R1-Distill-Qwen-32B): ~32B parameters (fits on single H100, ~70GB VRAM quantized)
- DeepSeek R1 (full): ~671B total parameters (MoE with 37B activated per token; requires 8x H100 cluster or aggressive 4-bit quantization on single H100)
Single GPU Cost Example (H100):
RunPod H100 PCIe at $1.99/hr. Inference throughput: ~80 tokens/sec per GPU. Processing 1 million input tokens + 5 million reasoning tokens:
6M total tokens / 80 tokens/sec = 75,000 seconds = ~20.8 hours = $41.50 GPU cost
Add inference framework overhead (vLLM, Ollama), caching server, monitoring: estimate $50-80/month for reasonable throughput.
Multi-GPU Cost Example (8x H100):
RunPod 8x H100 SXM at $21.52/hr. Throughput with distributed inference: ~400-500 tokens/sec aggregate.
Same 6M tokens / 450 tokens/sec = ~13,300 seconds = ~3.7 hours = $80/month GPU cost
Self-hosted wins on pure compute cost for high-volume reasoning workloads. Infrastructure complexity is the trade-off.
Cost Comparison with Alternatives
Per-Token Pricing (Input Only)
| Model | Input $/M | Reasoning? | Context | Output $/M |
|---|---|---|---|---|
| DeepSeek R1 | $0.55 | Yes | 128K | $2.19 |
| OpenAI o3-mini | $1.10 | Yes | 200K | $4.40 |
| OpenAI o3 | $10.00 | Yes | 200K | $40.00 |
| ChatGPT 5 | $1.25 | No | 272K | $10.00 |
| Claude Opus 4.6 | $5.00 | No | 1M | $25.00 |
| Grok 4 | $3.00 | No | 128K | $15.00 |
DeepSeek R1 is the cheapest reasoning model. But reasoning models consume more tokens. Cost-per-task depends on task complexity, not headline rate.
Real-World Cost Per Task
Task: Math Problem Solving (AIME-style)
Problem statement: 200 input tokens Required reasoning: 2,000-5,000 tokens (generated by model) Final answer: 100 tokens
DeepSeek R1: Input: ($0.55 × 0.0002M) = $0.00011 Output (reasoning + answer): ($2.19 × 0.005M) = $0.01095 Cost per task: $0.011
OpenAI o3-mini: Input: ($1.10 × 0.0002M) = $0.00022 Output: ($4.40 × 0.005M) = $0.022 Cost per task: $0.022
DeepSeek R1 is roughly 2x cheaper than o3-mini on reasoning tasks. But accuracy matters. If o3-mini solves 96% of AIME problems and R1 solves 85%, the cost-per-correct-answer flips: o3-mini becomes cheaper despite higher per-task cost.
Task: Text Summarization (Non-Reasoning)
Input: 10K tokens Output: 1K tokens
DeepSeek V3: ($0.27 × 0.01M) + ($1.10 × 0.001M) = $0.0027 + $0.0011 = $0.0038 per task
ChatGPT 5: ($1.25 × 0.01M) + ($10.00 × 0.001M) = $0.0125 + $0.01 = $0.0225 per task
For non-reasoning tasks, DeepSeek V3 is 5.9x cheaper than ChatGPT 5. ChatGPT 5's per-token cost is higher but throughput is competitive, making the effective cost similar on simple workloads.
Real-World Cost Scenarios
Scenario 1: Large-Batch Math Tutoring (5,000 problems/month)
Average problem: 200 input + 3K reasoning + 100 output = 3.3K tokens
Monthly token volume:
- Input: 200 × 5,000 = 1M tokens
- Reasoning + output: 3,100 × 5,000 = 15.5M tokens
Cost on DeepSeek R1: ($0.55 × 1M) + ($2.19 × 15.5M) = $0.55 + $33.95 = $34.50/month
Cost on ChatGPT 5 (without reasoning): ($1.25 × 1M) + ($10.00 × 15.5M) = $155.50/month
DeepSeek R1 is 4.5x cheaper for reasoning-heavy workloads.
Scenario 2: Code Review and Debugging (1M input tokens/month)
Input: 1M tokens (code snippets + context) Reasoning: 500K tokens (model-generated analysis) Output: 100K tokens (review feedback)
DeepSeek R1: ($0.55 × 1M) + ($2.19 × 0.6M) = $0.55 + $1.31 = $1.86/month
ChatGPT 5: ($1.25 × 1M) + ($10.00 × 0.6M) = $1.25 + $6.00 = $7.25/month
DeepSeek wins by 3.9x. Reasoning models are optimal for structured analysis tasks.
Scenario 3: Customer Support Chatbot (100K daily queries)
Most support queries: 200 input tokens, 100 output tokens 100K queries/day × 30 days = 3M total queries/month
Input: 200 tokens × 3M queries = 600M tokens Output: 100 tokens × 3M queries = 300M tokens
DeepSeek V3: ($0.27 × 0.6B) + ($1.10 × 0.3B) = $0.162 + $0.33 = $0.492/month
ChatGPT 5: ($1.25 × 0.6B) + ($10.00 × 0.3B) = $0.75 + $3.00 = $3.75/month
DeepSeek V3 is 7.6x cheaper for high-volume non-reasoning work. At this scale, the savings justify infrastructure management if needed.
Scenario 4: Research Synthesis (50 papers/month)
Average paper: 10K tokens Reasoning needed per paper: 2K tokens Summary: 500 tokens
Total: 50 papers × 12.5K tokens = 625K tokens/month
DeepSeek R1: ($0.55 × 0.025M) + ($2.19 × 0.6M) = $0.014 + $1.314 = $1.328/month
Claude Opus 4.6: ($5.00 × 0.025M) + ($25.00 × 0.6M) = $0.125 + $15 = $15.125/month
DeepSeek is 11.4x cheaper on research synthesis tasks.
Off-Peak Pricing and Discounts
Time-Based Discounts
DeepSeek announced off-peak pricing discounts (16:30-00:30 GMT) with rates up to 75% lower. During peak hours, rates are standard. During off-peak, R1 drops to approximately $0.12 input / $0.20 output per million tokens.
For teams with flexible batch processing windows (overnight jobs, scheduled analysis), off-peak pricing is significant. A batch job running at 8 PM UTC instead of 2 PM UTC saves 75% on hourly rates.
Monthly cost difference for 1M tokens with reasoning overhead:
- Peak: $11.719/month
- Off-peak (100% utilization): $11.719 × 0.25 = $2.93/month
Off-peak pricing only works for non-urgent workloads. Real-time applications pay standard rates.
Batch API and Volume Tiers
DeepSeek's batch API details as of March 2026. OpenAI and other providers offer 20-50% discounts for batch processing. DeepSeek may offer similar but documentation is less public.
R1 vs V3 Pricing
When to Use R1 (Reasoning Model) - $0.55/$2.19
- Math and logic problems
- Multi-step reasoning chains
- Technical debugging and code review
- Research synthesis
- Complex decision analysis
- Patent claim comparison
R1 costs 2x more on input ($0.55 vs $0.27) but excels at tasks where explicit reasoning improves accuracy. If a task would require multiple API calls or prompt engineering on V3 to achieve the same answer quality, R1 may be cost-effective despite higher per-token rate.
When to Use V3 (Standard Model) - $0.27/$1.10
- Classification and extraction
- Text summarization
- Paraphrasing
- Translation
- Customer support responses
- Code generation (straightforward, not debugging)
- Content tagging and labeling
V3 is faster (higher throughput) and cheaper. No reasoning overhead. Good for production inference where latency and cost matter more than deep reasoning.
Cost Breakdown Comparison
Extracting 100 entities from 50K text documents:
- Input per doc: 1K tokens
- Output per doc: 100 tokens
- Total: 50M input + 5M output
V3: ($0.27 × 50M) + ($1.10 × 5M) = $13.50 + $5.50 = $19/month
If using R1 for the same task (with reasoning overhead):
- Reasoning tokens per doc: 500 tokens
- Output total: 5M + 25M reasoning = 30M tokens
R1: ($0.55 × 50M) + ($2.19 × 30M) = $27.50 + $65.70 = $93.20/month
V3 is 4.9x cheaper for extraction tasks because reasoning doesn't improve extraction accuracy. Use V3.
Fine-Tuning and Customization
Via DeepSeek API
DeepSeek's fine-tuning API details, pricing, and availability as of March 2026. OpenAI charges ~$0.03/1K tokens for training data. Anthropic requires a direct sales agreement. DeepSeek's fine-tuning terms are less transparent publicly.
Teams interested in DeepSeek fine-tuning should contact DeepSeek sales directly for current terms.
Via Open-Weights Self-Hosting
DeepSeek R1 weights are open-source (MIT). Download, fine-tune on custom data using standard training recipes (DeepSpeed, Hugging Face transformers).
Estimated cost:
- Single GPU (H100 at $1.99/hr) for 20-50 hours = $40-100 total
- Plus infrastructure, optimization, validation: $50-200 total per fine-tune
Self-hosted fine-tuning is cheaper than API-based fine-tuning for teams with engineering capacity.
Fine-Tuning Use Cases
Custom R1 on proprietary data improves reasoning on domain-specific problems. A law firm fine-tuning R1 on internal case law and contract clauses would get better compliance checking. A healthcare provider fine-tuning R1 on medical literature would improve diagnostic reasoning.
Typical workflow:
- Download R1 weights (e.g., R1-Distill-Qwen-32B is ~64GB; full R1 671B requires multiple nodes)
- Prepare 10K-100K training examples (domain data, reasoning explanations)
- Fine-tune on 8x H100 for 2-7 days ($300-1,500 total)
- Deploy custom R1 on the infrastructure
- Cost per inference: only GPU rental, no API fees
Hidden Costs and Infrastructure
Inference Framework Overhead
Running DeepSeek locally requires an inference engine (vLLM, Ollama, Hugging Face Text Generation WebUI). These frameworks add ~10-15% computational overhead compared to bare-metal inference.
A single H100 inference at $1.99/hr with vLLM overhead might effectively cost $2.30/hr when accounting for framework inefficiency.
Quantization and Memory Trade-offs
DeepSeek R1 (671B total parameters, MoE with 37B active per token) requires ~1.3TB of VRAM in full BF16 precision for the full model. Quantization to 4-bit reduces this to ~60GB, fitting on a single H100 (80GB). Quantization reduces inference speed by 20-40%, cutting throughput and increasing effective per-request cost.
Full precision (multi-GPU): faster, higher cost Quantization (single GPU): slower, lower hardware cost, net cost roughly similar
Caching and Batching
vLLM's KV-cache feature reuses computation for repeated prompts. A customer support bot with a 50K token system prompt repeated across 1,000 requests caches that prompt once, saving 99% of the computation for that shared portion. Savings: from $41.50 to maybe $5 per 1M similar queries.
Batching multiple requests together improves GPU utilization. Batch size 32 is roughly 10-15% more efficient than batch size 1.
Network and Storage
Downloading model weights (e.g., ~64GB for R1-Distill-32B, or multi-TB for the full 671B model) requires significant time on gigabit internet but counts against bandwidth limits on some cloud providers. CoreWeave charges egress at standard rates. RunPod includes egress in hourly rates. Factor in: initial setup $30-100, ongoing negligible.
FAQ
How much does DeepSeek R1 cost per 1 million tokens? $0.55 input + $2.19 output for the standard answer. But R1 generates internal reasoning tokens (typically 2,000-5,000 per request) that count as output. Actual per-token cost varies by task complexity. A reasoning task costs 3-10x more per request than a non-reasoning task on the same input size.
Is DeepSeek R1 cheaper than ChatGPT? For reasoning tasks: yes, 2-4x cheaper on per-token basis. For non-reasoning tasks: DeepSeek V3 is comparable to ChatGPT 5. For high-accuracy specialized tasks where o3 is needed: pricing is similar or DeepSeek may be slightly cheaper.
Where can teams use DeepSeek R1? Official API at api.deepseek.com. Via Together.AI for hosted inference. Self-hosted on GPU rentals from RunPod, Lambda, CoreWeave, Vast.AI. Open weights allow local deployment on any hardware.
Can DeepSeek R1 be deployed locally? Yes. Weights are open-source (MIT). Requires GPU hardware (H100 for R1 full size, or quantization for smaller GPUs). Complexity is moderate. Infrastructure cost $10-100/month for reasonable throughput depending on usage volume.
Is reasoning worth the cost? Depends on the task. For math, debugging, and multi-step analysis: yes, reasoning models save tokens by working correctly in one pass. For simple classification or generation: no, standard models are faster and cheaper. Calculate the cost-per-correct-answer, not cost-per-token.
How does R1 accuracy compare to ChatGPT 5 on benchmarks? Comparative benchmarks not clearly published. Both are strong on reasoning. ChatGPT 5 likely leads on breadth. R1 likely competitive or better on specialized reasoning domains. Check official leaderboards (MMLU, GPQA, AIME).
What is the best use case for off-peak R1 pricing? Batch processing overnight. Code review, legal analysis, research synthesis, mathematical verification. Anything that can tolerate 6-8 hour latency saves 75%. Large-scale batch can reduce monthly costs from $100 to $25.
Should we self-host or use the API? API for: < 5M tokens/month, zero infrastructure overhead needed, regulatory compliance required (no API, but managed servers) Self-host for: > 10M tokens/month, maximum cost optimization, custom fine-tuning needed, data privacy requirements
Related Resources
- LLM Pricing Comparison
- DeepSeek Models and Pricing
- DeepSeek R1 vs GPT-4
- DeepSeek V3.1 vs R1
- DeepSeek R1 vs V3