Mistral API Pricing: Complete Breakdown with Cost Optimization

Deploybase · August 4, 2025 · LLM Pricing

Contents


Mistral API Pricing: Overview

Mistral AI's pricing model is straightforward: pay per token for inference, with optional batch discounts and self-hosting options. As of March 2026, Mistral offers three tiers of models (Mistral 7B, Mistral Small, Mistral Medium, Mistral Large) at different price points. The API is publicly accessible and does not require production contracts unless pursuing large-scale commitments. This guide breaks down exact pricing for each model, cost projections for common workloads, and when self-hosting becomes economically viable.


Mistral API Models

Available Models (as of March 2026)

Mistral 7B (open, quantized):

  • Context window: 32K tokens
  • Pricing: Free or self-hosted only (not available on public Mistral API)
  • Use case: Base for fine-tuning, local deployment

Mistral Small:

  • Context window: 32K tokens
  • Parameters: ~7B (quantized variant of Mistral 7B)
  • Strength: Fast, low-latency inference
  • Best for: Real-time applications, edge deployments

Mistral Medium:

  • Context window: 32K tokens
  • Parameters: ~26B (proprietary, between Small and Large)
  • Strength: Balanced performance and cost
  • Best for: General-purpose API use, most workloads

Mistral Large:

  • Context window: 128K tokens
  • Parameters: ~123B
  • Strength: Highest accuracy and reasoning
  • Best for: Complex reasoning, code generation, research

Mistral Large 2:

  • Context window: 128K tokens
  • Parameters: ~123B
  • Strength: Extended context, improved reasoning
  • Best for: Long-document analysis, multi-turn conversations

Pricing by Model

Token Pricing (USD per Million Tokens)

ModelInput $/MTokOutput $/MTokNotes
Mistral Small$0.14$0.42Fastest, cheapest
Mistral Medium$0.45$1.35Balanced option
Mistral Large$2.00$6.0034B parameters
Mistral Large 2$2.00$6.0064K context

Data from Mistral AI's official pricing page, verified March 22, 2026.

Cost Per Request Examples

Scenario 1: Summarize a 2000-token article

Input: 2000 tokens (article) + 100 tokens (prompt) = 2,100 tokens input Output: 200 tokens (summary) = 200 tokens output

Cost per request:

  • Mistral Small: (2,100 × $0.14 + 200 × $0.42) / 1M = $0.000322 (about 0.03 cents)
  • Mistral Medium: (2,100 × $0.45 + 200 × $1.35) / 1M = $0.001125 (about 0.11 cents)
  • Mistral Large: (2,100 × $2.00 + 200 × $6.00) / 1M = $0.005400 (about 0.54 cents)
  • Mistral Large 2: (2,100 × $2.00 + 200 × $6.00) / 1M = $0.005400 (about 0.54 cents)

At 1000 requests/month (30 per day):

  • Mistral Small: $0.32/month
  • Mistral Medium: $1.13/month
  • Mistral Large: $5.40/month
  • Mistral Large 2: $5.40/month

All are negligible for small volumes.

Scenario 2: Generate code (5000 token context, 1000 token output)

Input: 5,000 tokens Output: 1,000 tokens

Cost per request:

  • Mistral Small: (5,000 × $0.14 + 1,000 × $0.42) / 1M = $0.001120 (0.11 cents)
  • Mistral Medium: (5,000 × $0.45 + 1,000 × $1.35) / 1M = $0.003600 (0.36 cents)
  • Mistral Large: (5,000 × $2.00 + 1,000 × $6.00) / 1M = $0.016000 (1.60 cents)

At 100 requests/month:

  • Mistral Small: $0.11/month
  • Mistral Medium: $0.36/month
  • Mistral Large: $1.60/month

For typical API volumes below 10M tokens/month, costs are under $10.

High-Volume Pricing (1B tokens/month equivalent)

1 billion tokens = 1000 million tokens.

Input: 700M tokens, Output: 300M tokens (typical distribution: 70% input, 30% output)

  • Mistral Small: (700M × $0.14 + 300M × $0.42) = $224,000/month
  • Mistral Medium: (700M × $0.45 + 300M × $1.35) = $720,000/month
  • Mistral Large: (700M × $2.00 + 300M × $6.00) = $3,200,000/month

At these scales, self-hosting becomes competitive. Companies processing >500M tokens/month should evaluate local deployment.


Batch Processing Discounts

Mistral Batch API

Mistral offers a batch API for non-urgent requests with 50% discounts compared to standard API pricing.

Batch API Pricing:

ModelInput $/MTokOutput $/MTokDiscount
Mistral Small$0.07$0.2150%
Mistral Medium$0.225$0.67550%
Mistral Large$1.00$3.0050%
Mistral Large 2$1.00$3.0050%

Batch processing is ideal for non-real-time workloads. Requests are queued and processed during low-demand periods, typically with 24-hour turnaround.

Example: Processing 1M documents for batch classification

Assume 500 tokens per document, 50 token output (classification label).

Input: 1M documents × 500 tokens = 500M tokens Output: 1M documents × 50 tokens = 50M tokens

Standard API cost (Mistral Medium): (500M × $0.45 + 50M × $1.35) = $291,000 Batch API cost: (500M × $0.225 + 50M × $0.675) = $145,500

Savings: 50% ($145,500 savings per run)

Batch API makes sense if processing can wait 24 hours.


Cost Projections

Small Team Setup (Startup Building a Chatbot)

Traffic: 50,000 requests/month Average tokens per request: 2000 input, 400 output Total tokens: 50,000 × (2000 + 400) = 120M tokens/month (80M input, 40M output)

Monthly cost by model:

  • Mistral Small: (80M × $0.14 + 40M × $0.42) = $27,200 (about $900 per day)
  • Mistral Medium: (80M × $0.45 + 40M × $1.35) = $90,000 (about $3,000 per day)

For a startup with 50k requests/month, Mistral Small is $27k/month. That's expensive but manageable if revenue-per-request > $0.54. Mistral Medium is $90k/month.

At this scale, fine-tuning a cheaper base model (Mistral 7B, self-hosted) might be economical.

Mid-Market SaaS (10M tokens/month)

Input: 7M tokens, Output: 3M tokens

  • Mistral Small: (7M × $0.14 + 3M × $0.42) = $2,240/month
  • Mistral Medium: (7M × $0.45 + 3M × $1.35) = $7,200/month
  • Mistral Large: (7M × $2.00 + 3M × $6.00) = $32,000/month

At 10M tokens/month, Mistral Small is $2,240/month. Mid-market SaaS often processes more, pushing toward Mistral Medium at $7.2k/month. Mistral Large at $32k/month is only justified for complex reasoning workloads. For revenue-per-request > $0.50, Mistral Small or Medium is sustainable.

Production AI Agent (100M tokens/month)

Input: 70M tokens, Output: 30M tokens

  • Mistral Small: $22,400/month
  • Mistral Medium: $72,000/month
  • Mistral Large: $320,000/month

At 100M tokens/month, Mistral Small is $22.4k/month. Mistral Medium is $72k/month. Mistral Large at $320k/month makes self-hosting mandatory at this scale. At this volume, evaluating self-hosting becomes prudent even for Small and Medium.


Self-Hosted vs Cloud API

Self-Hosting Mistral 7B

Infrastructure cost (cloud GPU rental, RunPod):

RunPod's Mistral-optimized GPUs:

  • 1x NVIDIA RTX 4090 (24GB): $0.34/hr, handles ~5000 tokens/second inference

Monthly cost for continuous inference (730 hours):

  • 1x RTX 4090: 1 × $0.34 × 730 = $248/month

Comparison to cloud API:

  • 120M tokens/month (1000 tokens/second average):
    • API cost (Mistral Small): $27,200/month
    • GPU cost (1x RTX 4090): $248/month
    • Savings: $26,952/month

Self-hosting is dramatically cheaper at scale.

Infrastructure cost (high-throughput):

For 500M tokens/month (5,787 tokens/second, sustained):

  • 2x NVIDIA H100 PCIe clusters (RunPod): 2 × $1.99 × 730 = $2,901/month
  • API cost (Mistral Small): (350M × $0.14 + 150M × $0.42) = $112,000/month
  • Savings: $109,099/month

At 500M+ tokens/month, self-hosting saves $100k+/month.

Trade-offs: Self-Hosted vs API

Self-Hosted (Mistral 7B on GPU):

  • Pros: Ultra-low marginal cost, no per-token fees, complete control
  • Cons: Upfront infrastructure cost, ops overhead, scaling complexity, limited model variety
  • Best for: High-volume, predictable workloads (>200M tokens/month)

Cloud API (Mistral official):

  • Pros: No infrastructure management, instant scale, flexible model choice, monthly billing
  • Cons: Per-token fees add up at scale, vendor lock-in, public API rate limits
  • Best for: Startups, variable workloads, <100M tokens/month

Breakeven: Around 150-200M tokens/month (depends on model choice and infrastructure costs).


Cost Optimization Strategies

1. Use Smaller Models When Possible

Mistral Small is 70% cheaper than Mistral Medium and handles most classification, summarization, and basic generation tasks. Evaluate Small first; upgrade only if quality metrics require it.

For example: customer support ticket routing (Mistral Small) vs complex reasoning (Mistral Large).

2. Batch Processing for Non-Real-Time Workloads

Batch API offers 50% discounts. If turnaround can be 24 hours, batch cuts costs in half.

Example: Processing 100M documents overnight at half cost.

3. Compress Prompts

Fewer input tokens = lower cost. Use prompt compression or templating.

Example: Instead of passing full customer conversation history (5000 tokens), summarize to recent context (500 tokens). Cost reduction: 10x.

4. Use Context Caching (Mistral API Feature)

Mistral caches long prompts or system prompts. First request pays full price; subsequent requests with same cached content pay 90% less for the cached portion.

Ideal for: Multi-turn chatbots, where system prompt is reused 100+ times.

Example: 1000-token system prompt cached, used in 1000 conversations per day.

  • Without caching: 1000 × 1000 = 1M token cost/day
  • With caching: 1000 tokens (first request) + 999,000 × 10% (cached) = 100,900 tokens
  • Savings: ~90%

5. Model Quantization and Pruning (Self-Hosted)

If self-hosting, use quantized models (4-bit, 8-bit) to reduce memory and improve throughput. Mistral 7B 4-bit quantization fits on a single RTX 4090 with higher throughput.

Example: Quantized Mistral 7B processes 8000 tokens/second on RTX 4090 vs 5000 tokens/second for full-precision. Cost per token: 37.5% lower.

6. Reserved Capacity (Large Contracts)

If committing to $100k+/month, Mistral offers production pricing (discounts 10-30% off public rates). Negotiate based on volume.


Competitive Pricing Comparison

Mistral vs OpenAI (as of March 2026)

ModelInput $/MTokOutput $/MTokRanking
Mistral Small$0.14$0.42Cheapest
OpenAI GPT-4o mini$0.15$0.60Slightly more
Mistral Medium$0.45$1.35Mid-tier
OpenAI GPT-4o$2.50$10.00Expensive
Mistral Large$2.00$6.00Premium
OpenAI GPT-5.4$2.50$15.00Very expensive

Mistral is 10-20x cheaper than OpenAI's top models on per-token basis.

Mistral vs Anthropic (Claude)

ModelInput $/MTokOutput $/MTok
Mistral Small$0.14$0.42
Claude Haiku$1.00$5.00
Mistral Medium$0.45$1.35
Claude Sonnet$3.00$15.00
Mistral Large$2.00$6.00
Claude Opus$5.00$25.00

Mistral is significantly cheaper across all model tiers. Claude is more capable for complex reasoning; Mistral is better for cost-conscious teams.

Mistral vs Meta Llama (Open-Source)

Llama models are free to download but require self-hosting. Infrastructure cost (GPU rental) is the only expense.

  • Llama 3.1 8B on GPU: ~$250-500/month (depends on throughput)
  • Mistral Small API: $27k-90k+/month (depends on volume)

Llama is cheaper at any volume >1M tokens/month. But Llama requires DevOps expertise. Mistral API requires zero infrastructure.

For teams without MLOps expertise, Mistral API is cheaper than managing self-hosted Llama.

Real-World Cost Scenarios

Scenario A: Startup Building a Classification Service

A startup classifies customer documents into categories. 10,000 documents/day, 500 tokens/document, 50 tokens output.

Monthly volume: 10,000 × 30 = 300,000 requests = 150M tokens input, 15M tokens output

Using Mistral Small:

  • (150M × $0.14 + 15M × $0.42) = $27,900/month
  • Cost per document: $0.093
  • Revenue per document must exceed $0.10 for profitability

Using batch API (50% discount):

  • (150M × $0.07 + 15M × $0.21) = $13,950/month
  • Cost per document: $0.047
  • If classification can wait 24 hours, this halves the cost

Using self-hosted Llama 7B (RunPod):

  • 1x RTX 4090: $248/month
  • Throughput: ~5000 tokens/second
  • Processing 150M tokens at 5000 tok/s = 30,000 seconds = 8.3 hours/month
  • Cost per document: $0.0008

Verdict: Self-hosted is 50-100x cheaper for high-volume, non-real-time workloads.

Scenario B: SaaS Chatbot with Real-Time Interaction

An AI chatbot SaaS provider serves 1000 users, 50 requests/user/day = 50,000 requests/day.

Average request: 1000 input tokens (conversation history), 200 output tokens (response).

Monthly volume: 50,000 × 30 = 1.5M requests = 1.5B input tokens, 300M output tokens.

Using Mistral Medium:

  • (1.5B × $0.45 + 300M × $1.35) = $945,000/month
  • Cost per user per month: $945
  • Must charge >$1,500 per user for 50% margins

Using Mistral Small:

  • (1.5B × $0.14 + 300M × $0.42) = $336,000/month
  • Cost per user per month: $336
  • Must charge >$500 per user for 50% margins

Verdict: At this scale, Mistral Medium pricing is challenging. Must optimize for Mistral Small or evaluate self-hosting.

Scenario C: Production Search Assistant

A corporation builds an internal search tool for 5,000 employees. 100 searches/employee/month = 500,000 queries.

Average: 2000 input tokens (document context), 500 output tokens (answer).

Monthly volume: 500,000 × (2000 + 500) = 1.25B tokens = 1.0B input, 250M output.

Using Mistral Medium:

  • (1.0B × $0.45 + 250M × $1.35) = $787,500/month
  • Cost per employee per month: $157.50

Using self-hosted (2x H100 cluster):

  • GPU cost: 2 × $1.99 × 730 = $2,901/month
  • Can handle 1.25B tokens easily (provides excess capacity)
  • Cost per employee per month: $0.58

Verdict: Self-hosting saves $784k/month. ROI from infrastructure investment is immediate at production scale.


FAQ

Which Mistral model should I use?

Start with Mistral Small. It's the cheapest and handles 80% of tasks (classification, summarization, simple generation). Upgrade to Medium if quality metrics (BLEU, accuracy) fall short. Most teams use Small or Medium in production.

Is Mistral cheaper than OpenAI?

Yes. Mistral Small ($0.14/$0.42) is significantly cheaper than GPT-4o ($2.50/$10.00) for lightweight workloads. Mistral Large ($2/$6) is cheaper than GPT-4o on both input and output costs. OpenAI is more capable for complex reasoning; Mistral is better for cost-sensitive workloads.

When should I self-host instead of using the API?

Break-even is around 150-200M tokens/month. If processing >300M tokens/month, self-hosting on H100 GPUs saves $50k+/month. Below 50M tokens/month, API is cheaper.

Does Mistral have rate limits?

Mistral API enforces rate limits (tokens/minute, requests/minute) based on plan tier. Standard tier: 1000 requests/minute, 1M tokens/minute. business plans remove limits.

What's the difference between Mistral Large and Large 2?

Large 2 has improved reasoning and a 128K context window. Use Large for standard tasks, Large 2 for long-document analysis and enhanced reasoning.

Can I fine-tune Mistral models?

Yes. Fine-tuning API available for Mistral Small and Medium. Cost: ~10-15% premium on training tokens. Fine-tuned models reduce inference token cost by 2-4x through prompt compression.

Does Mistral offer volume discounts?

volume contracts (>$100k/month) can negotiate 10-30% discounts. Standard public API has no volume discounts, only batch discount (50% off).

What's included in the Mistral API free tier?

Mistral offers a free tier with limited quota (~1M tokens/month, rate-limited). After that, pay-as-you-go pricing applies.



Sources