Contents
- Deepseek V3 Pricing: Overview
- Official DeepSeek API Pricing
- Third-Party Provider Pricing
- Cost Per Task Analysis (10+ Scenarios)
- Monthly Volume Breakdown
- Comparison to GPT-4.1 and Claude
- Self-Hosting on Cloud GPU
- Context Window and Hidden Costs
- Hosting Decision Framework
- FAQ
- Related Resources
- Sources
Deepseek V3 Pricing: Overview
DeepSeek V3 pricing starts at $0.27 per million input tokens and $1.10 per million output tokens through the official API as of March 2026. That's significantly cheaper than GPT-4 Turbo and Claude Opus while matching or beating both on reasoning benchmarks. The pricing spread across providers is extreme. Official DeepSeek API at the low end. Third-party hosts (Together, Fireworks) add 2-9x markup. Self-hosting on cloud infrastructure costs more per token than the official API but offers unlimited throughput without rate limits. For teams sending <100M tokens monthly, official API wins. Above 500M tokens/month, self-hosting breaks even.
Official DeepSeek API Pricing
DeepSeek's official API pricing is simple: pay per million tokens, no subscription.
| Model | Input ($/1M) | Output ($/1M) | Context | Max Output | Use Case |
|---|---|---|---|---|---|
| DeepSeek V3.1 | $0.27 | $1.10 | 256K | 8K | General tasks, standard reasoning |
| DeepSeek R1 | $0.55 | $2.19 | 256K | 8K | Complex reasoning, R1 thinking |
| DeepSeek V2.5 | $0.10 | $0.20 | 64K | 4K | legacy tier |
Data from official API docs, March 21, 2026.
No rate limits. Calls scale to handle massive throughput. Send 1B tokens in an hour, pay $140K input + whatever output tokens total. No throttling, no quota surprises, no tiered pricing tiers. Teams pay for consumption, nothing else.
No minimum commitment. Call the API 10 times a month or 10 million times. Pricing doesn't change. Startups and large-scale teams pay the same rate.
Third-Party Provider Pricing
Hosting DeepSeek through intermediaries adds convenience and operational overhead. Markups vary widely.
Fireworks.AI
Fireworks hosts DeepSeek V3 but doesn't publish explicit per-token rates. Standard Fireworks model: pay for deployment capacity (vCPU-hours, GPU-hours). Pricing structure is usage-based but opaque without a calculator. Estimated markup: 30-50% over official API based on provider margins.
Together.AI
Together.AI charges $1.25 per million tokens for DeepSeek V3.1 input/output combined pricing. That's roughly 1.8x the official API average rate ($0.69/M combined on official). Together's premium covers:
- Managed inference infrastructure
- Uptime SLA (99.5%)
- Rate limiting and throttling
- Fallback routing (multiple providers for redundancy)
Worth it if teams need operational simplicity and can't manage API keys.
Azure AI Foundry
Microsoft integrates DeepSeek models. Pricing anchored to official DeepSeek rates but with markup for Azure's integration layer. Details at Azure AI Foundry pricing page. Typical markup: 20-35%. Useful if the infrastructure is already Azure-locked.
OpenRouter.AI
OpenRouter aggregates multiple providers including DeepSeek. Charges per token, typically $0.30-0.50 per million, roughly 2-3.5x official rates. Useful for prototyping or avoiding direct API key management, but expensive for production.
Anthropic (Claude) SDK integration
Anthropic has released routing capabilities to DeepSeek through their SDK. No separate pricing; uses official DeepSeek API rates. Useful if the stack is Anthropic-first but want cost-efficient alternatives.
Cost Per Task Analysis (10+ Scenarios)
Using official DeepSeek V3.1 API pricing ($0.27 input / $1.10 output per 1M tokens):
Scenario 1: Single Text Completion
Prompt: 500 tokens. Response: 200 tokens.
Cost: (500 × $0.27 + 200 × $1.10) / 1M = $0.000355
Cost per request: $0.000355
Running 1,000 of these daily: 1,000 × 30 × $0.000355 = $10.65/month
Scenario 2: Long Document Analysis
Prompt: 5K tokens (full research paper). Response: 500 tokens.
Cost: (5K × $0.27 + 500 × $1.10) / 1M = $0.00190
Cost per request: $0.00190
Processing 100 research papers monthly: 100 × $0.00190 = $0.19/month
Scenario 3: Code Review and Refactoring Suggestions
Prompt: 8K tokens (code file + context). Response: 3K tokens (suggestions).
Cost: (8K × $0.27 + 3K × $1.10) / 1M = $0.00546
Cost per request: $0.00546
Running 50 code reviews daily (2,500 monthly): 2,500 × $0.00546 = $13.65/month
Scenario 4: Batch Inference (10K requests)
Tokens per request: 1K prompt + 300 output = 1.3K total.
Total tokens: 10K requests × 1.3K = 13M tokens.
Cost: (10M × $0.27 + 3M × $1.10) / 1M = $6.00
Cost per request: $0.00060
Processing 10K classification tasks: $6.00 total
Infrastructure costs (orchestration, storage) vastly exceed API costs at this scale.
Scenario 5: Real-Time Customer Support Bot
Configuration: 100 QPS (queries per second), 8 hours/day operation.
Prompts: Average 300 tokens (customer message + context). Responses: Average 200 tokens (support reply).
Daily calculation:
- Requests per day: 100 QPS × 3,600 sec × 8 hours = 2.88M requests/day
- Input tokens: 2.88M × 300 = 864M tokens/day
- Output tokens: 2.88M × 200 = 576M tokens/day
- Daily cost: (864M × $0.27 + 576M × $1.10) / 1M = $866.88/day
- Monthly (22 business days): $19,071/month
Equivalent on Claude Opus ($5 input / $25 output):
- Daily cost: (864M × $5 + 576M × $25) / 1M = $15,552/day
- Monthly: $342,144/month
DeepSeek is 18x cheaper than Claude for this workload. Practical impact: a small SaaS can afford 100 QPS on DeepSeek. Claude pricing forces either downsizing or passing costs to users.
Scenario 6: Long-Context Document Summarization
Prompt: 50K tokens (full technical specification). Response: 2K tokens (summary).
Cost: (50K × $0.27 + 2K × $1.10) / 1M = $0.01570
Cost per request: $0.01570
Summarizing 1,000 documents monthly: 1,000 × $0.01570 = $15.70/month
Scenario 7: Multi-Turn Conversation (Chat Session)
A user interacts with the bot over 10 turns (5 back-and-forth exchanges).
Turn 1: 200 tokens input, 100 output Turn 2: 400 tokens input (history + new), 150 output . (average 600 input, 150 output per turn) Turn 10: 1,200 tokens input (full history), 200 output
Total per session: 7K input tokens, 1.5K output tokens.
Cost: (7K × $0.27 + 1.5K × $1.10) / 1M = $0.00354
Cost per 10-turn session: $0.00354
Running 100 concurrent users with average 5 sessions/day: 500 sessions/day × 30 days × $0.00354 = $53.10/month
Scenario 8: Training Data Tagging at Scale
Prompt: 500 tokens (unlabeled text). Response: 50 tokens (tags).
Tagging 100,000 documents:
- Total input: 100K × 500 = 50M tokens
- Total output: 100K × 50 = 5M tokens
- Cost: (50M × $0.27 + 5M × $1.10) / 1M = $19.00
Cost per document: $0.00019
A 10M-document tagging project: $1,900 in API costs
Compare to hiring annotators: 10M documents × $0.10/document (annotator cost) = $1M. DeepSeek saves 99.8% vs human labor. The economics are staggering for data labeling.
Scenario 9: Recommendation Engine Query
Prompt: 2K tokens (user profile, history, product catalog excerpt). Response: 500 tokens (ranked recommendations).
Running 10M queries/month:
- Cost: (10M × 2K × $0.27 + 10M × 500 × $1.10) / 1M = $10,900/month
Cost per query: $0.00109
Handling 10M queries monthly on Claude Opus: (10M × 2K × $5 + 10M × 500 × $25) / 1M = $175,000/month
DeepSeek is 16x cheaper. A startup can build a personalization engine on DeepSeek that would be unaffordable on Claude.
Scenario 10: Realtime Code Completion in IDE
Prompt: 1K tokens (surrounding code context). Response: 100 tokens (suggestion).
Running 1,000 completions/day (per developer):
- Daily: 1,000 × 100 days = 100K requests
- Input: 100M tokens
- Output: 10M tokens
- Daily cost: (100M × $0.27 + 10M × $1.10) / 1M = $38.00/day
- Annual: $13,870/year per developer
Reasonable for large-scale adoption. VS Code with DeepSeek backend becomes feasible.
Scenario 11: Fact-Checking and Source Retrieval
Prompt: 3K tokens (claim + retrieved context). Response: 500 tokens (fact-check result).
Processing 100,000 claims:
- Cost: (100K × 3K × $0.27 + 100K × 500 × $1.10) / 1M = $136.00
Cost per claim: $0.00136
Building a fact-checking service for news sites: $136 per 100K articles
Scenario 12: Translation Pipeline
Prompt: 4K tokens (foreign language text). Response: 4.5K tokens (English translation).
Translating 500K documents:
- Cost: (500K × 4K × $0.27 + 500K × 4.5K × $1.10) / 1M = $2,715.00
Cost per document: $0.00543
Translating a 50M-word corpus: $27,150 in API costs vs hiring translators at $0.05/word = $2.5M. DeepSeek saves 98.9%.
Monthly Volume Breakdown
At what volumes does each hosting option win?
| Monthly Tokens | Official API | Together | Azure (est.) | Break-Even |
|---|---|---|---|---|
| 100M | $69 | $125 | $95 | Official |
| 500M | $345 | $625 | $475 | Official |
| 1B | $690 | $1,250 | $950 | Official (1.8x) |
| 10B | $6,900 | $12,500 | $9,500 | Official (1.8x) |
| 50B | $34,500 | $62,500 | $47,500 | Official (1.8x) |
The math is clear: official DeepSeek API is cheapest at every scale. Third-party providers only make sense if teams need:
- Fallback redundancy across multiple providers
- Unified billing across multiple LLM APIs
- Managed SLA guarantees (Together: 99.5% uptime)
- Simplified operational complexity
For cost-conscious teams, official API is always the answer.
Comparison to GPT-4.1 and Claude
| Model | Input ($/1M) | Output ($/1M) | Combined (avg) | Cost Ratio vs DeepSeek |
|---|---|---|---|---|
| DeepSeek V3.1 | $0.27 | $1.10 | $0.69 | 1.0x |
| GPT-5 Mini | $0.25 | $2.00 | $1.13 | 1.6x |
| Claude Sonnet 4.6 | $3.00 | $15.00 | $9.00 | 13.0x |
| GPT-4.1 | $2.00 | $8.00 | $5.00 | 7.2x |
| Claude Opus 4.6 | $5.00 | $25.00 | $15.00 | 21.7x |
| GPT-4o | $2.50 | $10.00 | $6.25 | 9.1x |
DeepSeek V3.1 undercuts every major closed-source model significantly. Only open-source models hosted cheaply approach this pricing.
Performance comparison: DeepSeek V3.1 scores within 5-10% of GPT-4 Turbo on most benchmarks while costing several times less. For reasoning tasks, DeepSeek R1 ($0.55/$2.19) beats GPT-4 Turbo on complex reasoning while still costing 3-4x less.
The cost-per-capability ratio is now unambiguously in DeepSeek's favor. Claude and GPT-4 are premium products for edge cases (maximum reasoning, specific domains). For most applications, DeepSeek is the better choice.
Self-Hosting on Cloud GPU
Teams can host DeepSeek V3 on their own infrastructure using vLLM, SGLang, or NVIDIA Triton.
Single H100 (80GB)
- Cloud rental: $2.69/hr on RunPod (H100 SXM)
- Monthly cost (730 hrs): $1,964/month
- Throughput: 75-100 tokens/second sustained
- Monthly tokens: 75 tokens/sec × 86,400 sec × 30 days = 194.4B tokens/month
- Cost per token: $1,964 / 194.4B = $0.0000101/token
Comparison to official API:
Official API: $0.69/M tokens (avg input/output at $0.27/$1.10) Self-hosted H100: $0.0000101/token = $0.0000101M or $0.0101/M tokens
Wait. Self-hosted is cheaper than official API? The math only works for teams that achieve 100% utilization. In practice:
- H100 sits idle 40-50% of the time (bursty demand)
- Effective throughput: 40 tokens/sec
- Effective tokens/month: 103.7B
- Effective $/token: $1,964 / 103.7B = $0.0000189/token or $18.9/M
Self-hosted H100 at 50% utilization: $18.9/M. Official API: $0.69/M (avg). Official wins by 27x.
Self-hosting only breaks even above 60-70% sustained utilization.
A100 (80GB) - Cheaper Alternative
- Cloud rental: $1.19/hr on RunPod
- Monthly cost (730 hrs): $869/month
- Throughput: 50 tokens/second
- Effective (50% util): 25 tokens/sec × 86,400 × 30 = 64.8B tokens/month
- Cost per token (50% util): $869 / 64.8B = $0.0000134/token or $13.4/M
Still 32x more expensive than official API at 50% utilization. Doesn't break even until 80%+ utilization.
8x H100 Cluster - High Volume
- CoreWeave cost: $49.24/hr = $35,945/month
- Throughput: 600+ tokens/second sustained
- Monthly tokens (100% util): 600 × 86,400 × 30 = 1.555T tokens/month
- Cost per token: $35,945 / 1.555T = $0.0000231/token or $23.1/M
Still more expensive than official API ($0.69/M avg). But at this scale, teams are supporting massive internal demand. The infrastructure enables custom fine-tuning, local inference (no latency), and compliance-grade data isolation. Trade-offs beyond pure per-token cost.
Break-even analysis:
Self-hosting makes sense when:
- Teams are processing 500B+ tokens/month (official API becomes expensive at scale due to high volume)
- Teams need custom fine-tuning (official API doesn't support it)
- Teams require on-premises deployment (compliance, data residency)
- Teams are building a commercial LLM product (pass-through hosting)
For most applications below 500B tokens/month, official API is cheaper.
Context Window and Hidden Costs
DeepSeek V3.1 ships with a 256K token context window.
Context costs at scale
Small prompt (5K tokens): (5K × $0.27) / 1M = $0.00135
Full context (256K tokens): (256K × $0.27) / 1M = $0.069
The difference: using the full context window costs 51x more per query than a small prompt.
For a document analysis application processing 50K-token documents repeatedly:
- Single document: $0.018 per query
- 1,000 queries: $18.00
- 10,000 queries: $180.00
Hidden cost: longer prompts directly inflate bills. Teams often don't realize they're including context they don't need. Optimization strategy: use vector databases (Pinecone, Weaviate) to retrieve only relevant snippets before passing to DeepSeek. Reduces input tokens 50-70%, cuts costs proportionally.
Hosting Decision Framework
Official DeepSeek API if:
- <500B tokens/month
- No custom fine-tuning required
- No on-premises requirement
- Cost is primary concern
Third-party provider (Together, Fireworks) if:
- Need redundancy across multiple providers
- Prefer unified billing across model APIs
- Want managed SLA guarantees
- Can tolerate 2-9x markup for operational simplicity
Self-hosted cloud GPU if:
- 500B+ tokens/month (economics break even)
- Custom fine-tuning on proprietary data
- On-premises compliance requirement
- Building commercial LLM product
Self-hosted on-premises if:
- 1T+ tokens/month
- Strict data residency
- Compliance requirements (HIPAA, SOC 2)
- Long-term deployment (>2 years amortizes hardware)
FAQ
How does DeepSeek V3 pricing compare to using a local model? Running Llama 3 70B locally costs roughly $30/hr in infrastructure ($2.69 H100 × 12 hours/day × 30 days = $969/month). DeepSeek V3.1 at $0.27 input costs $0.000378 per typical query (1.4K tokens). Process 100K+ queries monthly and local inference becomes cost-equivalent. Below that, DeepSeek API is cheaper and eliminates ops overhead.
Does DeepSeek offer volume discounts? Not officially as of March 2026. Pricing is uniform for all customers. Teams with 10B+ tokens/month are better served by self-hosting on cloud infrastructure or negotiating with Together.AI for custom rates.
How much does it cost to fine-tune DeepSeek V3? Fine-tuning services are not published by DeepSeek as of March 2026. Assume they may offer this in the future. For now, parameter-efficient tuning (LoRA) using the base API is the viable option.
What's the difference between input and output token pricing? Input tokens cost $0.27 because they're just consumed (no generation). Output tokens cost $1.10 because the model must generate each one. A 10K-token prompt with 2K-token output costs more than a 1K-token prompt producing the same output.
Is there a free tier? Yes. DeepSeek offers 1M free tokens per month for new accounts. Enough for development and light testing. Ideal for prototyping before committing to production inference.
Can I stream responses to reduce latency? Yes. DeepSeek API supports streaming responses via Server-Sent Events (SSE). Streaming does not change token pricing. Tokens are counted the same whether returned in full or streamed.
How does pricing scale for large teams? No published large-scale discounts. For 10B+ monthly tokens, contact DeepSeek directly for custom pricing. Expect 10-20% discounts for commitments, or pivot to self-hosting.
What's the latency of API calls? Official API: 100-500ms per request (includes network latency). Streaming adds ~50ms. Together.AI: similar. Self-hosted: <50ms local, varies by network.
Related Resources
- DeepSeek Model Documentation
- All LLM Models and Pricing
- LLM Cost Comparison Calculator
- Self-Hosting LLMs on Cloud GPU
- DeepSeek R1 vs GPT-4 Comparison