Contents
- Mistral Pricing: Overview
- Mistral API Pricing
- Model Details
- Self-Hosted Open-Source Options
- Pricing Per Use Case
- European Data Residency
- Cost vs OpenAI and Anthropic
- Context Windows and Limits
- Performance Benchmarks
- FAQ
- Related Resources
- Sources
Mistral Pricing: Overview
Mistral pricing covers three API tiers as of March 2026: Mistral Small (lightweight inference), Mistral Medium (balanced), and Mistral Large (frontier capability). Pricing ranges from $0.10 per million input tokens (Small) to $6.00 per million output tokens (Large). All models are accessed via API (mistral.AI) with OpenAI-compatible endpoints. Mistral also publishes open-source weights for Small and Medium, allowing teams to self-host and avoid per-token API costs entirely. European data residency compliance and competitive pricing have driven adoption among European companies and cost-conscious teams.
Mistral API Pricing
| Model | Input $/M | Output $/M | Combined $/M | Context | Best For |
|---|---|---|---|---|---|
| Small | $0.10 | $0.30 | $0.40 | 8K | Classification, embeddings, lightweight tasks |
| Medium | $0.27 | $0.81 | $1.08 | 32K | Balanced, multi-turn chat, RAG |
| Large | $2.00 | $6.00 | $8.00 | 128K | Complex reasoning, code, long context |
Data as of March 2026. All prices in USD per million tokens. Context windows: Small 8K tokens, Medium 32K tokens, Large 128K tokens. Output tokens cost 3x input tokens (Small/Medium) or 3x (Large), reflecting computational cost (multi-token generation is more expensive than single-token processing).
Model Details
Mistral Small
Small is a 7B parameter model optimized for lightweight inference and cost efficiency. 8K context window. $0.10/M input tokens, $0.30/M output tokens = $0.40 per million combined tokens. Fastest inference (~100 tok/s on single A100), lowest latency among Mistral tiers.
Capability: comparable to OpenAI GPT-4o Mini in general-purpose chat and reasoning. Performs well on classification, sentiment analysis, entity extraction. Weaker on complex reasoning (math, complex code generation, multi-step logic).
Use case: question-answering over documents (RAG), text classification, lightweight chatbots, customer support classification, content tagging, email filtering. Not suitable for complex reasoning or code generation.
Monthly cost estimate (1M input + 1M output tokens): $0.40. Scaling to 100M input + 50M output: $14.50/month. Extremely cost-effective for high-volume simple tasks.
Example: customer support chatbot processing 10,000 daily messages (100 tokens input, 50 output) = 1.5M daily tokens = 45M/month = $18/month API cost.
Mistral Medium
Medium is a 30-40B parameter model (exact architecture not disclosed by Mistral). 32K context window. $0.27/M input, $0.81/M output = $1.08 per million combined tokens. Balanced performance and cost.
Capability: comparable to OpenAI GPT-4 Turbo on general tasks and multi-turn conversation. Slightly lower than GPT-4 Turbo on reasoning benchmarks (MATH: 60% vs 68%). Faster inference than GPT-4 Turbo due to smaller model size (40B vs 100B+ estimated).
Use case: multi-turn conversation, longer context requirements (RAG with 20K+ context), code assistance, content generation, summarization, data extraction from documents.
Monthly cost (100M input + 50M output tokens): $32.50. Competitive against OpenAI GPT-4 Turbo ($200 input + $600 output on same token mix = $800/month, 24x more expensive).
Latency: ~2-3 tokens/second per token (output generation). Suitable for real-time chat (1-2 second end-to-end response time for 100-token output).
Mistral Large
Large is a 123B parameter model. 128K context window. $2.00/M input, $6.00/M output = $8.00 per million combined tokens. Frontier capability.
Trained on extended context (128K tokens natively supported without performance degradation). Reasoning, code generation, long-document analysis, multi-step logical problems.
Capability: comparable to OpenAI o1-mini on reasoning and coding benchmarks. Weaker than o1 (OpenAI's reasoning-optimized model) but faster. Competitive with Anthropic Claude Opus on multi-turn chat but cheaper per output token ($6.00 vs $25 on Claude Opus).
MATH benchmark: Mistral Large 67%, Claude Opus 68%, OpenAI o1-mini 72%. Reasoning gap: 1-5 percentage points. Acceptable for most production reasoning tasks.
Code generation: LeetCode Hard problems, Mistral Large 55% pass rate, Claude Opus 62%, OpenAI o1-mini 75%. Mistral is slightly weaker but useful for typical coding tasks (not frontier algorithmic problems).
Monthly cost (100M input + 50M output tokens): (100M × $2.00 + 50M × $6.00) / 1M = $500. Production-level workload. Comparable to Claude Sonnet ($18/M × 150M tokens = $2,700/month, 5.4x more expensive than Mistral Large).
Latency: ~1 token/second per token (output generation). Suitable for real-time chat (5-10 second response time for 500-token output). Slower than Medium due to model size, but accuracy is higher.
Self-Hosted Open-Source Options
Mistral publishes open-source weights for Small and Medium models. Teams can self-host using cloud GPUs, eliminating per-token API costs entirely. Breakeven analysis determines when self-hosting is worthwhile.
Mistral Small (7B)
Open-source weights available on Hugging Face (mistralai/Mistral-7B). Single A100 PCIe ($1.19/hr on RunPod spot): can serve ~100 tok/s.
Monthly cost (24/7): $1.19/hr × 730 hrs = $869/month. Annual: $10,425.
Throughput: 100 tok/s × 2,592,000 seconds/month = 259.2M tokens/month. Cost-per-token: $869 / 259.2M = $0.00335/M tokens.
API cost for same tokens: 259.2M × $0.40 = $103.7/month.
Breakeven: 6-8 months of sustained 100-200M monthly tokens. If inference demand is consistent and extends beyond 6 months, self-hosting Small wins long-term.
Setup: vLLM, Text Generation WebUI, or ollama on A100. Time investment: 4-8 hours (Docker, model download, configuration, testing). Fine-tuning: LoRA adapters on Small are feasible (7B model trains quickly).
Mistral Medium (30-40B)
Open-source weights available. Requires H100 PCIe or H100 SXM for acceptable inference speed (~80-100 tok/s). Multi-GPU medium deployments use 2x H100 ($5.38/hr on RunPod spot) for higher throughput.
Monthly cost (single H100, $1.99/hr): $1,453/month. Annual: $17,436.
Throughput (single H100): 90 tok/s × 2,592,000 sec/month = 233M tokens/month. Cost-per-token: $1,453 / 233M = $0.00623/M tokens.
API cost for same tokens: 233M × $1.08 = $251.6/month.
Breakeven: 5-6 months of 200-300M monthly tokens. Self-hosting Medium is attractive if throughput is high and consistent.
Multi-GPU medium (2x H100): $5.38/hr × 730 = $3,927/month. Throughput: 180 tok/s = 466M tokens/month. Cost-per-token: $0.00842/M (higher per token, but more capacity). Use if needing parallel request handling or low-latency serving (multiple users simultaneously).
Mistral Large (123B)
No open-source weights released by Mistral (as of March 2026). Teams wanting to self-host Large must use alternative 100B+ models: Llama 3 405B, DeepSeek 671B, or fine-tuned variants.
Cost of self-hosting 123B model equivalent: 4x H100 cluster ($10.76/hr on RunPod spot for multi-GPU) = $7,855/month (or 8x A100 SXM cluster on RunPod = $11.12/hr = $8,118/month).
Throughput (4x H100): 400-500 tok/s. Monthly tokens: 400 × 2,592,000 = 1.04B tokens. Cost-per-token: $7,855 / 1.04B = $7.55/M tokens.
API cost for same tokens at $8.00/M combined: 1.04B × $8.00 = $8,320/month.
Self-hosting 123B saves $8,320 - $7,855 = $465/month. At this scale self-hosting becomes cost-effective, unlike smaller models where API is cheaper.
Pricing Per Use Case
Customer Support Chatbot (Multi-turn)
Scenario: 1,000 daily conversations, 5 turns per conversation, 200 input tokens + 150 output tokens per turn.
Daily: 1,000 × 5 × (200 + 150) = 1.75M tokens Monthly: 52.5M tokens (reasonable for support team)
Mistral Small (API):
- Input: 35M × $0.10 = $3.50
- Output: 17.5M × $0.30 = $5.25
- Total: $8.75/month
Mistral Medium (API):
- Input: 35M × $0.27 = $9.45
- Output: 17.5M × $0.81 = $14.18
- Total: $23.63/month
Mistral Medium (Self-Hosted, H100):
- Cost: $1,453/month (fixed)
- Breakeven: $1,453 / $23.63 = 61 months of this workload
- Not cost-effective unless scaling to 1.6B+ monthly tokens
Decision: Use API for Small/Medium if <100M monthly tokens. Self-host Medium if expecting >300M monthly tokens over 12+ months.
Document Analysis Pipeline (RAG)
Scenario: Process 10,000 documents daily, 1,000 tokens each (input), 200 tokens output per document (summary/extraction).
Daily: 10,000 × (1,000 + 200) = 12M tokens Monthly: 360M tokens (large-scale production pipeline)
Mistral Small (API):
- Input: 300M × $0.10 = $30
- Output: 60M × $0.30 = $18
- Total: $48/month (extremely cheap)
Mistral Medium (API):
- Input: 300M × $0.27 = $81
- Output: 60M × $0.81 = $48.60
- Total: $129.60/month
Mistral Medium (Self-Hosted, 2x H100):
- Cost: $5.38/hr × 730 = $3,927/month
- Throughput: 180 tok/s = 466M tokens/month
- Under-utilized if only processing 360M tokens/day (77% capacity)
- Cost-per-token: $3,927 / 360M = $0.0109/M tokens
- API cost-per-token (Medium): $129.60 / 360M = $0.00036/M tokens
API wins decisively. Self-hosting 2x H100 is overkill for this workload.
Mistral Small (Self-Hosted, A100):
- Cost: $869/month (single A100)
- Throughput: 100 tok/s = 259M tokens/month
- Under-utilized (72% capacity)
- Cost-per-token: $869 / 360M = $0.00241/M tokens
- API cost-per-token: $48 / 360M = $0.00013/M tokens
API is 18x cheaper per token. Use Mistral Small API for this workload.
Code Generation (IDE Assistant)
Scenario: 100 developers, 50 code completions per developer per day, 100 input tokens + 50 output tokens per completion.
Daily: 100 × 50 × (100 + 50) = 750K tokens Monthly: 22.5M tokens (reasonable developer productivity)
Mistral Small (API):
- Total: 22.5M × $0.40 = $9/month
Mistral Medium (API):
- Total: 22.5M × $1.08 = $24.30/month
Mistral Large (API):
- Total: 22.5M × $8.00 = $180/month
For IDE use case, volume is low (<100M tokens/month). API is cost-effective. Use Mistral Medium ($24/month) for better reasoning than Small on complex code problems. Upgrade to Large ($180/month) if developer team is working on reasoning-heavy problems (algorithm design, architectural decisions).
Synthetic Data Generation
Scenario: Generate 1M synthetic training examples, 500 input tokens prompt + 200 output tokens per example.
Total: 1M × (500 + 200) = 700M tokens One-time generation cost
Mistral Small:
- Total: 700M × $0.40 = $280
Mistral Medium:
- Total: 700M × $1.08 = $756
Mistral Large:
- Total: 700M × $8.00 = $5,600
One-time generation, Small is sufficient for basic synthetic data (simple text augmentation). Medium if quality/diversity matters. Large if generating complex reasoning or code samples.
Cost comparison: OpenAI GPT-4 Turbo would cost 700M × $10 = $7M (27x more expensive). DeepSeek-V3 would cost 700M × $0.42 = $294 (similar to Mistral Small).
European Data Residency
Mistral offers EU data residency: customer prompts and responses remain on Mistral's EU-based servers (Frankfurt, Paris data centers). No data transfer to US. Critical for:
- GDPR compliance (personal data cannot leave EU)
- Healthcare companies (patient data, strict regulations)
- Government contracts (ITAR, EAR export restrictions)
- Enterprises with data sovereignty requirements
OpenAI and Anthropic (US-based) do not guarantee EU residency by default. Azure OpenAI in Europe offers EU data residency at higher cost ($20+ per 1M tokens vs $10-15 via OpenAI API).
Mistral's EU compliance is competitive advantage for European teams. Pricing in EU (as of March 2026): same as global rates (no premium for EU data residency). Data residency is built-in, not an add-on.
Implication: European companies can adopt Mistral API without separate data processing agreements (DPA). Compliance is simplified.
Cost vs OpenAI and Anthropic
API Pricing Comparison
| Model | Input $/M | Output $/M | Combined $/M | Use Case |
|---|---|---|---|---|
| Mistral Small | $0.10 | $0.30 | $0.40 | Lightweight, classification |
| Mistral Medium | $0.27 | $0.81 | $1.08 | Balanced, multi-turn |
| Mistral Large | $2.00 | $6.00 | $8.00 | Complex reasoning |
| OpenAI GPT-4o Mini | $0.15 | $0.60 | $0.75 | Lightweight alternative |
| OpenAI GPT-4o | $2.50 | $10.00 | $12.50 | High-capability chat |
| OpenAI GPT-4 Turbo | $2.00 | $8.00 | $10.00 | Complex reasoning (older) |
| OpenAI o1 Mini | $3.00 | $12.00 | $15.00 | Reasoning-optimized |
| Anthropic Claude Opus | $5.00 | $25.00 | $30.00 | Frontier reasoning |
| Anthropic Claude Sonnet | $3.00 | $15.00 | $18.00 | Balanced capability |
| DeepSeek-V3 | $0.14 | $0.28 | $0.42 | Cost-optimized frontier |
Mistral Small is 47% cheaper than GPT-4o Mini ($0.40 vs $0.75 combined). Mistral Large ($8.00 combined) is cheaper than OpenAI GPT-4 Turbo ($10.00) on price. Mistral competes aggressively on price across all tiers.
DeepSeek-V3 ($0.42/M combined) is significantly cheaper than Mistral Small ($0.40/M) and offers equivalent quality. DeepSeek Large equivalent would be cheaper than Mistral Large ($8.00 combined).
Capability Comparison
Mistral Small ≈ GPT-4o Mini (general chat, light reasoning) Mistral Medium ≈ GPT-4 Turbo or Sonnet (strong multi-turn, code, reasoning) Mistral Large ≈ Claude Opus (complex reasoning) but cheaper
Mistral's pricing undercuts OpenAI/Anthropic on cost-per-capability. Drawback: slightly lower reasoning scores on specialized benchmarks (math, complex code generation). But improvement margins are <10% (MATH: 67% vs 68% Claude, negligible).
Context Windows and Limits
| Model | Input Context | Output Limit | Requests/min | Tokens/min |
|---|---|---|---|---|
| Small | 8K | 4K | 10 | 100K |
| Medium | 32K | 32K | 50 | 500K |
| Large | 128K | 128K | 100 | 1M |
Small: suitable for short conversations (chatbot, classification). Medium/Large: suitable for RAG with long document context.
Rate limits: Standard tier allows 10 req/min (Small), 50 req/min (Medium), 100 req/min (Large). Production tier: custom limits (contact sales). Token-per-minute limits are more important for batch processing (1M tokens/min Large tier allows 833 parallel completions at 1,200 tok/completion).
Performance Benchmarks
LLM Benchmarks (Published by Mistral)
MATH (complex mathematical reasoning):
- Mistral Small: 16%
- Mistral Medium: 58%
- Mistral Large: 67%
- Claude Opus: 68%
- OpenAI o1-mini: 72%
Mistral Large is competitive with Opus, trailing o1-mini by 5%.
HumanEval (code generation):
- Mistral Small: 26%
- Mistral Medium: 64%
- Mistral Large: 75%
- Claude Opus: 78%
- OpenAI o1-mini: 85%
Mistral Large is strong, trailing top models by 5-10%.
MMLU (general knowledge):
- Mistral Small: 68%
- Mistral Medium: 84%
- Mistral Large: 88%
- Claude Opus: 88%
- OpenAI o1-mini: 92%
Mistral Large matches Claude Opus on MMLU.
Latency and Throughput
Time-to-first-token (TTFT):
- Mistral Small: 50-100ms (fastest)
- Mistral Medium: 100-200ms
- Mistral Large: 200-500ms
Inter-token latency (ITL):
- Mistral Small: 20-30ms per token
- Mistral Medium: 30-50ms per token
- Mistral Large: 50-100ms per token
Mistral Small is suitable for real-time chat (<500ms roundtrip acceptable). Medium is suitable for conversational AI. Large is slower but acceptable for non-interactive use cases (email, batch processing).
FAQ
What is the difference between Mistral Small, Medium, and Large?
Small (7B): lightweight, $0.40/M tokens combined. Medium (30-40B): balanced, $1.08/M tokens combined. Large (123B): frontier reasoning, $2.00/M input, $6.00/M output. Use Small for classification/embeddings. Use Medium for chat/RAG. Use Large for complex reasoning/code.
Can I use Mistral's open-source weights commercially?
Yes. Small and Medium weights are Apache 2.0 licensed (permissive). Can fine-tune, redistribute, and modify commercially. Large weights are not released by Mistral (use alternative open-source models or API).
How do I self-host Mistral?
Download weights from Hugging Face (mistralai/Mistral-7B, mistralai/Mistral-30B-v0.3). Use inference engines (vLLM, Text Generation WebUI, ollama). Deploy on cloud GPU (A100 for Small, H100 for Medium). Setup time: 1-2 hours (download, install dependencies, test).
Is Mistral cheaper than OpenAI?
Yes. Mistral Small ($0.40/M combined) is 47% cheaper than GPT-4o Mini ($0.75/M). Mistral Large ($8/M combined) is cheaper than GPT-4 Turbo ($10/M). Capability gaps are small (<10% on reasoning benchmarks).
Does Mistral support function calling and structured output?
Yes. Mistral API supports both function calling (tool use) and structured JSON output. Compatible with OpenAI function calling API (minimal changes required for migration).
What is Mistral's API rate limit?
Starter tier: 5 requests/minute (Small). Professional: 50 req/min. Enterprise: custom limits. Token-per-minute: 100K (Small), 500K (Medium), 1M (Large). Contact Mistral sales for higher limits.
Can I use Mistral offline (no internet)?
Download weights and self-host. Requires A100/H100 GPU and inference engine (vLLM). No internet required. Perfect for air-gapped environments.
How does Mistral Large compare to Claude Opus for reasoning?
Mistral Large (MATH benchmark): 67% accuracy. Claude Opus: 68%. Gap is 1%, negligible. Mistral Large input ($2.00/M) is significantly cheaper than Claude Opus ($5.00/M input, $25.00/M output).
Does Mistral offer fine-tuning?
No managed fine-tuning (as of March 2026). OpenAI and Anthropic offer fine-tuning. Mistral's solution: self-host weights and fine-tune locally. Mistral team is working on managed fine-tuning (expected mid-2026).
What are Mistral's SLAs and uptime guarantees?
99.9% uptime SLA for professional tier. large-scale customers: custom SLAs. No guaranteed latency (depends on load). Typical latency: 100-500ms for first token, 20-100ms per token thereafter.
Can I access Mistral API through Azure?
Not directly. Mistral is independent. Azure offers OpenAI integration. Consider: use Mistral directly via API, or use Azure OpenAI for integration convenience. No Azure-Mistral offering exists (as of March 2026).
What about rate limiting and quota management?
Mistral API uses token budgets (monthly quota). Starter tier: 1M tokens/month. Professional: 100M tokens/month. Enterprise: custom. Overage pricing: available but expensive. Budgets reset monthly.
How does Mistral handle safety and moderation?
Mistral applies content filtering (explicit content, violence, illegal activity). Models are trained to decline harmful requests. Safety level: comparable to OpenAI/Anthropic. Customization: production accounts can request custom safety policies.
Related Resources
- Mistral AI Official Docs
- OpenAI API Pricing
- Anthropic Claude Pricing
- DeepSeek API Pricing
- LLM Pricing Comparison Dashboard
Sources
- Mistral AI Pricing
- Mistral API Documentation
- Mistral Open-Source Models
- DeployBase LLM Pricing Tracker (March 2026 observations)