DeepSeek V3 Pricing: API Costs, Hosting Options, and Real-World Scenarios

Deploybase · March 21, 2026 · LLM Pricing

Contents

Deepseek V3 Pricing: Overview

DeepSeek V3 pricing starts at $0.27 per million input tokens and $1.10 per million output tokens through the official API as of March 2026. That's significantly cheaper than GPT-4 Turbo and Claude Opus while matching or beating both on reasoning benchmarks. The pricing spread across providers is extreme. Official DeepSeek API at the low end. Third-party hosts (Together, Fireworks) add 2-9x markup. Self-hosting on cloud infrastructure costs more per token than the official API but offers unlimited throughput without rate limits. For teams sending <100M tokens monthly, official API wins. Above 500M tokens/month, self-hosting breaks even.


Official DeepSeek API Pricing

DeepSeek's official API pricing is simple: pay per million tokens, no subscription.

ModelInput ($/1M)Output ($/1M)ContextMax OutputUse Case
DeepSeek V3.1$0.27$1.10256K8KGeneral tasks, standard reasoning
DeepSeek R1$0.55$2.19256K8KComplex reasoning, R1 thinking
DeepSeek V2.5$0.10$0.2064K4Klegacy tier

Data from official API docs, March 21, 2026.

No rate limits. Calls scale to handle massive throughput. Send 1B tokens in an hour, pay $140K input + whatever output tokens total. No throttling, no quota surprises, no tiered pricing tiers. Teams pay for consumption, nothing else.

No minimum commitment. Call the API 10 times a month or 10 million times. Pricing doesn't change. Startups and large-scale teams pay the same rate.


Third-Party Provider Pricing

Hosting DeepSeek through intermediaries adds convenience and operational overhead. Markups vary widely.

Fireworks.AI

Fireworks hosts DeepSeek V3 but doesn't publish explicit per-token rates. Standard Fireworks model: pay for deployment capacity (vCPU-hours, GPU-hours). Pricing structure is usage-based but opaque without a calculator. Estimated markup: 30-50% over official API based on provider margins.

Together.AI

Together.AI charges $1.25 per million tokens for DeepSeek V3.1 input/output combined pricing. That's roughly 1.8x the official API average rate ($0.69/M combined on official). Together's premium covers:

  • Managed inference infrastructure
  • Uptime SLA (99.5%)
  • Rate limiting and throttling
  • Fallback routing (multiple providers for redundancy)

Worth it if teams need operational simplicity and can't manage API keys.

Azure AI Foundry

Microsoft integrates DeepSeek models. Pricing anchored to official DeepSeek rates but with markup for Azure's integration layer. Details at Azure AI Foundry pricing page. Typical markup: 20-35%. Useful if the infrastructure is already Azure-locked.

OpenRouter.AI

OpenRouter aggregates multiple providers including DeepSeek. Charges per token, typically $0.30-0.50 per million, roughly 2-3.5x official rates. Useful for prototyping or avoiding direct API key management, but expensive for production.

Anthropic (Claude) SDK integration

Anthropic has released routing capabilities to DeepSeek through their SDK. No separate pricing; uses official DeepSeek API rates. Useful if the stack is Anthropic-first but want cost-efficient alternatives.


Cost Per Task Analysis (10+ Scenarios)

Using official DeepSeek V3.1 API pricing ($0.27 input / $1.10 output per 1M tokens):

Scenario 1: Single Text Completion

Prompt: 500 tokens. Response: 200 tokens.

Cost: (500 × $0.27 + 200 × $1.10) / 1M = $0.000355

Cost per request: $0.000355

Running 1,000 of these daily: 1,000 × 30 × $0.000355 = $10.65/month


Scenario 2: Long Document Analysis

Prompt: 5K tokens (full research paper). Response: 500 tokens.

Cost: (5K × $0.27 + 500 × $1.10) / 1M = $0.00190

Cost per request: $0.00190

Processing 100 research papers monthly: 100 × $0.00190 = $0.19/month


Scenario 3: Code Review and Refactoring Suggestions

Prompt: 8K tokens (code file + context). Response: 3K tokens (suggestions).

Cost: (8K × $0.27 + 3K × $1.10) / 1M = $0.00546

Cost per request: $0.00546

Running 50 code reviews daily (2,500 monthly): 2,500 × $0.00546 = $13.65/month


Scenario 4: Batch Inference (10K requests)

Tokens per request: 1K prompt + 300 output = 1.3K total.

Total tokens: 10K requests × 1.3K = 13M tokens.

Cost: (10M × $0.27 + 3M × $1.10) / 1M = $6.00

Cost per request: $0.00060

Processing 10K classification tasks: $6.00 total

Infrastructure costs (orchestration, storage) vastly exceed API costs at this scale.


Scenario 5: Real-Time Customer Support Bot

Configuration: 100 QPS (queries per second), 8 hours/day operation.

Prompts: Average 300 tokens (customer message + context). Responses: Average 200 tokens (support reply).

Daily calculation:

  • Requests per day: 100 QPS × 3,600 sec × 8 hours = 2.88M requests/day
  • Input tokens: 2.88M × 300 = 864M tokens/day
  • Output tokens: 2.88M × 200 = 576M tokens/day
  • Daily cost: (864M × $0.27 + 576M × $1.10) / 1M = $866.88/day
  • Monthly (22 business days): $19,071/month

Equivalent on Claude Opus ($5 input / $25 output):

  • Daily cost: (864M × $5 + 576M × $25) / 1M = $15,552/day
  • Monthly: $342,144/month

DeepSeek is 18x cheaper than Claude for this workload. Practical impact: a small SaaS can afford 100 QPS on DeepSeek. Claude pricing forces either downsizing or passing costs to users.


Scenario 6: Long-Context Document Summarization

Prompt: 50K tokens (full technical specification). Response: 2K tokens (summary).

Cost: (50K × $0.27 + 2K × $1.10) / 1M = $0.01570

Cost per request: $0.01570

Summarizing 1,000 documents monthly: 1,000 × $0.01570 = $15.70/month


Scenario 7: Multi-Turn Conversation (Chat Session)

A user interacts with the bot over 10 turns (5 back-and-forth exchanges).

Turn 1: 200 tokens input, 100 output Turn 2: 400 tokens input (history + new), 150 output . (average 600 input, 150 output per turn) Turn 10: 1,200 tokens input (full history), 200 output

Total per session: 7K input tokens, 1.5K output tokens.

Cost: (7K × $0.27 + 1.5K × $1.10) / 1M = $0.00354

Cost per 10-turn session: $0.00354

Running 100 concurrent users with average 5 sessions/day: 500 sessions/day × 30 days × $0.00354 = $53.10/month


Scenario 8: Training Data Tagging at Scale

Prompt: 500 tokens (unlabeled text). Response: 50 tokens (tags).

Tagging 100,000 documents:

  • Total input: 100K × 500 = 50M tokens
  • Total output: 100K × 50 = 5M tokens
  • Cost: (50M × $0.27 + 5M × $1.10) / 1M = $19.00

Cost per document: $0.00019

A 10M-document tagging project: $1,900 in API costs

Compare to hiring annotators: 10M documents × $0.10/document (annotator cost) = $1M. DeepSeek saves 99.8% vs human labor. The economics are staggering for data labeling.


Scenario 9: Recommendation Engine Query

Prompt: 2K tokens (user profile, history, product catalog excerpt). Response: 500 tokens (ranked recommendations).

Running 10M queries/month:

  • Cost: (10M × 2K × $0.27 + 10M × 500 × $1.10) / 1M = $10,900/month

Cost per query: $0.00109

Handling 10M queries monthly on Claude Opus: (10M × 2K × $5 + 10M × 500 × $25) / 1M = $175,000/month

DeepSeek is 16x cheaper. A startup can build a personalization engine on DeepSeek that would be unaffordable on Claude.


Scenario 10: Realtime Code Completion in IDE

Prompt: 1K tokens (surrounding code context). Response: 100 tokens (suggestion).

Running 1,000 completions/day (per developer):

  • Daily: 1,000 × 100 days = 100K requests
  • Input: 100M tokens
  • Output: 10M tokens
  • Daily cost: (100M × $0.27 + 10M × $1.10) / 1M = $38.00/day
  • Annual: $13,870/year per developer

Reasonable for large-scale adoption. VS Code with DeepSeek backend becomes feasible.


Scenario 11: Fact-Checking and Source Retrieval

Prompt: 3K tokens (claim + retrieved context). Response: 500 tokens (fact-check result).

Processing 100,000 claims:

  • Cost: (100K × 3K × $0.27 + 100K × 500 × $1.10) / 1M = $136.00

Cost per claim: $0.00136

Building a fact-checking service for news sites: $136 per 100K articles


Scenario 12: Translation Pipeline

Prompt: 4K tokens (foreign language text). Response: 4.5K tokens (English translation).

Translating 500K documents:

  • Cost: (500K × 4K × $0.27 + 500K × 4.5K × $1.10) / 1M = $2,715.00

Cost per document: $0.00543

Translating a 50M-word corpus: $27,150 in API costs vs hiring translators at $0.05/word = $2.5M. DeepSeek saves 98.9%.


Monthly Volume Breakdown

At what volumes does each hosting option win?

Monthly TokensOfficial APITogetherAzure (est.)Break-Even
100M$69$125$95Official
500M$345$625$475Official
1B$690$1,250$950Official (1.8x)
10B$6,900$12,500$9,500Official (1.8x)
50B$34,500$62,500$47,500Official (1.8x)

The math is clear: official DeepSeek API is cheapest at every scale. Third-party providers only make sense if teams need:

  • Fallback redundancy across multiple providers
  • Unified billing across multiple LLM APIs
  • Managed SLA guarantees (Together: 99.5% uptime)
  • Simplified operational complexity

For cost-conscious teams, official API is always the answer.


Comparison to GPT-4.1 and Claude

ModelInput ($/1M)Output ($/1M)Combined (avg)Cost Ratio vs DeepSeek
DeepSeek V3.1$0.27$1.10$0.691.0x
GPT-5 Mini$0.25$2.00$1.131.6x
Claude Sonnet 4.6$3.00$15.00$9.0013.0x
GPT-4.1$2.00$8.00$5.007.2x
Claude Opus 4.6$5.00$25.00$15.0021.7x
GPT-4o$2.50$10.00$6.259.1x

DeepSeek V3.1 undercuts every major closed-source model significantly. Only open-source models hosted cheaply approach this pricing.

Performance comparison: DeepSeek V3.1 scores within 5-10% of GPT-4 Turbo on most benchmarks while costing several times less. For reasoning tasks, DeepSeek R1 ($0.55/$2.19) beats GPT-4 Turbo on complex reasoning while still costing 3-4x less.

The cost-per-capability ratio is now unambiguously in DeepSeek's favor. Claude and GPT-4 are premium products for edge cases (maximum reasoning, specific domains). For most applications, DeepSeek is the better choice.


Self-Hosting on Cloud GPU

Teams can host DeepSeek V3 on their own infrastructure using vLLM, SGLang, or NVIDIA Triton.

Single H100 (80GB)

  • Cloud rental: $2.69/hr on RunPod (H100 SXM)
  • Monthly cost (730 hrs): $1,964/month
  • Throughput: 75-100 tokens/second sustained
  • Monthly tokens: 75 tokens/sec × 86,400 sec × 30 days = 194.4B tokens/month
  • Cost per token: $1,964 / 194.4B = $0.0000101/token

Comparison to official API:

Official API: $0.69/M tokens (avg input/output at $0.27/$1.10) Self-hosted H100: $0.0000101/token = $0.0000101M or $0.0101/M tokens

Wait. Self-hosted is cheaper than official API? The math only works for teams that achieve 100% utilization. In practice:

  • H100 sits idle 40-50% of the time (bursty demand)
  • Effective throughput: 40 tokens/sec
  • Effective tokens/month: 103.7B
  • Effective $/token: $1,964 / 103.7B = $0.0000189/token or $18.9/M

Self-hosted H100 at 50% utilization: $18.9/M. Official API: $0.69/M (avg). Official wins by 27x.

Self-hosting only breaks even above 60-70% sustained utilization.

A100 (80GB) - Cheaper Alternative

  • Cloud rental: $1.19/hr on RunPod
  • Monthly cost (730 hrs): $869/month
  • Throughput: 50 tokens/second
  • Effective (50% util): 25 tokens/sec × 86,400 × 30 = 64.8B tokens/month
  • Cost per token (50% util): $869 / 64.8B = $0.0000134/token or $13.4/M

Still 32x more expensive than official API at 50% utilization. Doesn't break even until 80%+ utilization.

8x H100 Cluster - High Volume

  • CoreWeave cost: $49.24/hr = $35,945/month
  • Throughput: 600+ tokens/second sustained
  • Monthly tokens (100% util): 600 × 86,400 × 30 = 1.555T tokens/month
  • Cost per token: $35,945 / 1.555T = $0.0000231/token or $23.1/M

Still more expensive than official API ($0.69/M avg). But at this scale, teams are supporting massive internal demand. The infrastructure enables custom fine-tuning, local inference (no latency), and compliance-grade data isolation. Trade-offs beyond pure per-token cost.

Break-even analysis:

Self-hosting makes sense when:

  1. Teams are processing 500B+ tokens/month (official API becomes expensive at scale due to high volume)
  2. Teams need custom fine-tuning (official API doesn't support it)
  3. Teams require on-premises deployment (compliance, data residency)
  4. Teams are building a commercial LLM product (pass-through hosting)

For most applications below 500B tokens/month, official API is cheaper.


Context Window and Hidden Costs

DeepSeek V3.1 ships with a 256K token context window.

Context costs at scale

Small prompt (5K tokens): (5K × $0.27) / 1M = $0.00135

Full context (256K tokens): (256K × $0.27) / 1M = $0.069

The difference: using the full context window costs 51x more per query than a small prompt.

For a document analysis application processing 50K-token documents repeatedly:

  • Single document: $0.018 per query
  • 1,000 queries: $18.00
  • 10,000 queries: $180.00

Hidden cost: longer prompts directly inflate bills. Teams often don't realize they're including context they don't need. Optimization strategy: use vector databases (Pinecone, Weaviate) to retrieve only relevant snippets before passing to DeepSeek. Reduces input tokens 50-70%, cuts costs proportionally.


Hosting Decision Framework

Official DeepSeek API if:

  • <500B tokens/month
  • No custom fine-tuning required
  • No on-premises requirement
  • Cost is primary concern

Third-party provider (Together, Fireworks) if:

  • Need redundancy across multiple providers
  • Prefer unified billing across model APIs
  • Want managed SLA guarantees
  • Can tolerate 2-9x markup for operational simplicity

Self-hosted cloud GPU if:

  • 500B+ tokens/month (economics break even)
  • Custom fine-tuning on proprietary data
  • On-premises compliance requirement
  • Building commercial LLM product

Self-hosted on-premises if:

  • 1T+ tokens/month
  • Strict data residency
  • Compliance requirements (HIPAA, SOC 2)
  • Long-term deployment (>2 years amortizes hardware)

FAQ

How does DeepSeek V3 pricing compare to using a local model? Running Llama 3 70B locally costs roughly $30/hr in infrastructure ($2.69 H100 × 12 hours/day × 30 days = $969/month). DeepSeek V3.1 at $0.27 input costs $0.000378 per typical query (1.4K tokens). Process 100K+ queries monthly and local inference becomes cost-equivalent. Below that, DeepSeek API is cheaper and eliminates ops overhead.

Does DeepSeek offer volume discounts? Not officially as of March 2026. Pricing is uniform for all customers. Teams with 10B+ tokens/month are better served by self-hosting on cloud infrastructure or negotiating with Together.AI for custom rates.

How much does it cost to fine-tune DeepSeek V3? Fine-tuning services are not published by DeepSeek as of March 2026. Assume they may offer this in the future. For now, parameter-efficient tuning (LoRA) using the base API is the viable option.

What's the difference between input and output token pricing? Input tokens cost $0.27 because they're just consumed (no generation). Output tokens cost $1.10 because the model must generate each one. A 10K-token prompt with 2K-token output costs more than a 1K-token prompt producing the same output.

Is there a free tier? Yes. DeepSeek offers 1M free tokens per month for new accounts. Enough for development and light testing. Ideal for prototyping before committing to production inference.

Can I stream responses to reduce latency? Yes. DeepSeek API supports streaming responses via Server-Sent Events (SSE). Streaming does not change token pricing. Tokens are counted the same whether returned in full or streamed.

How does pricing scale for large teams? No published large-scale discounts. For 10B+ monthly tokens, contact DeepSeek directly for custom pricing. Expect 10-20% discounts for commitments, or pivot to self-hosting.

What's the latency of API calls? Official API: 100-500ms per request (includes network latency). Streaming adds ~50ms. Together.AI: similar. Self-hosted: <50ms local, varies by network.



Sources