Azure OpenAI Pricing: PTU vs On-Demand Comparison

Overview
Azure OpenAI Pricing Models
PTU Costs
Pay-as-You-Go Costs
PTU vs On-Demand
Data Residency
Cost Optimization
Real-World Examples
FAQ
Related Resources
Sources

Overview

Azure OpenAI has two pricing modes: PTU (Provisioned Throughput Units) and pay-as-you-go.

PTU requires upfront commitment but saves 30-50% per token. Pay-as-you-go has zero commitment, works for variable loads. Same models as OpenAI API (GPT-4, Turbo, 4o).

Production workloads with stable throughput? PTU wins. Experimental or bursty? Pay-as-you-go. Azure costs roughly 15-20% more per token than OpenAI API, but compliance, data residency, and bulk pricing matter for regulated industries.

Azure OpenAI Pricing Models

Pay-as-You-Go (On-Demand)

Same as OpenAI API pricing but hosted on Azure infrastructure.

Model	Input $/M	Output $/M	Context	Max Output
GPT-4 Turbo	$10.00	$30.00	128K	4K
GPT-4o	$2.50	$10.00	128K	16K
GPT-4o Mini	$0.15	$0.60	128K	16K
GPT-4.1	$2.00	$8.00	1.05M	32K
GPT-4.1 Mini	$0.40	$1.60	1.05M	32K

No minimums. Pay per token (prompt and completion metered separately).

Monthly cost example: 100M tokens (60M prompt, 40M completion).

GPT-4o: (60 × $2.50) + (40 × $10.00) = $150 + $400 = $550/month

Provisioned Throughput Units (PTU)

Reserve capacity upfront. Lower per-token rate in exchange for minimum commitment.

PTU: Unit of throughput. 1 PTU roughly = 1,000 tokens per minute (TPM).

Model	PTU Cost/Hour	Hourly Throughput	Monthly Cost (730 hrs)
GPT-4 Turbo	$100	1M TPM	$73,000
GPT-4o	$40	1M TPM	$29,200
GPT-4o Mini	$10	1M TPM	$7,300
GPT-4.1	$80	1M TPM	$58,400
GPT-4.1 Mini	$20	1M TPM	$14,600

PTU pricing is per hour, not per token. Flat rate regardless of actual usage. Reserve only what users need; unused PTU is waste.

Minimum commitment: 1 hour (can be deleted after 1 hour, but rare for production).

PTU Costs

Understanding Throughput

1 PTU = ~1,000 tokens per minute (TPM) sustained throughput.

For a 7B parameter model:

Batch inference (summarization, classification): high throughput, variable latency
Interactive chat (low batch size): lower throughput, strict latency SLA

Throughput calculation:

Tokens/minute = average request size × requests/minute
Example: 500 token requests, 100 requests/minute = 50,000 TPM = 50 PTUs

Cost Structure

PTU pricing includes:

Base hourly rate (varies by model)
No per-token overage (use as much as reserved capacity allows)
Quota increase (scale to higher TPM by adding PTUs)

Once users exceed the PTU reservation, requests are throttled (delayed) rather than rejected. Developers can upgrade PTU anytime (instant, no downtime).

PTU Sizing

Use Case	Throughput	PTUs Needed	Cost/Month
Low (one user, async)	100 TPM	0.1	$2,920
Medium (100 users, async)	10K TPM	10	$292,000
High (1000 users, real-time)	500K TPM	500	$14.6M
Extreme (multi-region, massive)	2M TPM	2000	$58.4M

PTU costs scale linearly. For very high throughput, PTU is much cheaper than pay-as-you-go (which would cost millions).

Pay-as-You-Go Costs

Per-Token Pricing

Based on OpenAI API rates, with minor regional variation (Azure might be ±5%).

Example Workload: Document Classification (100M tokens/month)

Task: Classify 1M documents (500 token input, 50 token output).

Inputs: 1M × 500 = 500M tokens
Outputs: 1M × 50 = 50M tokens

Cost on Azure OpenAI (GPT-4o):

Input: 500M × $2.50/M = $1,250
Output: 50M × $10.00/M = $500
Total: $1,750/month

Cost on OpenAI API (GPT-4o):

Same rate
Total: $1,750/month

Azure and OpenAI API cost the same for pay-as-you-go. The Azure advantage is in PTU or compliance features.

Overages and Limits

No overage charges. But Azure enforces rate limits (TPM quotas). If users exceed the quota, requests are throttled:

Quota: 40,000 TPM by default
Exceed: Requests queued, delayed, or rejected

Increase quota by requesting PTU or by contacting Azure support.

PTU vs On-Demand

Break-Even Analysis

PTU becomes cheaper when monthly token usage × on-demand rate exceeds PTU cost.

Example: GPT-4o

Monthly Tokens	On-Demand Cost	PTU Cost (40/hr)	Winner
10M	$200	$29,200	On-Demand
100M	$2,000	$29,200	On-Demand
500M	$10,000	$29,200	On-Demand
1B	$20,000	$29,200	On-Demand
2B	$40,000	$29,200	PTU
5B	$100,000	$29,200	PTU

PTU break-even: ~2B tokens/month for GPT-4o.

For teams processing 2B+ tokens monthly, PTU saves money. Below 2B, on-demand is cheaper.

Workload Types

On-Demand Best For:

R&D and experimentation (unpredictable volume)
Bursty loads (high traffic for 1-2 hours, then idle)
Low-volume production (<500M tokens/month)
Cost-sensitive early-stage startups
One-off projects

PTU Best For:

Predictable, high-volume production (2B+ tokens/month)
SaaS platforms serving thousands of users
Content generation at scale
24/7 always-on services
Teams with forecasted budget

Switching Between Models

PTU is per-model. If users need GPT-4o and GPT-4o Mini:

Separate PTUs: One for GPT-4o ($40/hr), one for GPT-4o Mini ($10/hr)
Total cost: $50/hr

Alternatively, use on-demand for one and PTU for the other (hybrid).

Data Residency

Azure Advantage

Azure OpenAI stores data in specific regions (no US-only default like OpenAI API).

Availability by Region (as of March 2026):

East US, West US: United States
West Europe: European Union (GDPR-compliant)
Southeast Asia: Singapore
Japan East: Japan
UK South: United Kingdom

Data stays in the specified region. OpenAI API defaults to US, with limited options.

Compliance Benefits

GDPR (EU): If processing EU resident data, Azure EU regions are preferred (Microsoft guarantees compliance).

HIPAA (Healthcare): Azure OpenAI supports HIPAA business associate agreements (BAA). OpenAI API does not.

SOC 2 Type II: Azure certified. OpenAI API has SOC 2, but Azure adds stronger guarantees.

FedRAMP (US Government): Azure Government Cloud available. OpenAI API not available for government use.

Cost Impact of Residency

Regional pricing is similar (within 5%). EU regions may cost slightly more.

Region	GPT-4o Input $/M	GPT-4o Output $/M
East US	$2.50	$10.00
West Europe	$2.70	$10.80
Southeast Asia	$2.50	$10.00

The compliance/residency benefit is worth the cost for regulated industries (healthcare, finance, government).

Cost Optimization

Strategy 1: Right-Size PTU Reservation

Avoid over-provisioning. Monitor actual usage (Azure Portal, PTU utilization metrics). Right-size quarterly.

Current: Provisioned 50 PTUs
Actual Peak Usage: 30 PTUs (60% utilization)
Waste: 20 PTUs × $40/hr × 730 = $584,000/year

Action: Reduce to 35 PTUs, save ~$292,000/year

Strategy 2: Use Cheaper Models When Possible

GPT-4o Mini is 90% cheaper than GPT-4o Turbo on most tasks. Benchmark:

Classification, summarization, simple Q&A: Use Mini. Save 90% cost.
Math, complex reasoning, creative writing: Use Turbo or GPT-4.1. Worth the cost.

Example: Classify 1M documents.

Using GPT-4o Turbo: 500M inputs × $10 + 50M outputs × $30 = $6,500
Using GPT-4o Mini: 500M inputs × $0.15 + 50M outputs × $0.60 = $810

Mini saves $5,690/month (87% cheaper).

Strategy 3: Batch Processing

Azure OpenAI Batch API (preview) offers 50% discounts for non-real-time processing.

Submit: 1M token requests in batch
Wait: Up to 24 hours for results
Save: 50% per token

Cost: 1M inputs × $0.25 × 50% = $125 (vs $2.50 on-demand).

Caveat: Only for workloads that can tolerate 24-hour latency (reports, analysis, not chat).

Strategy 4: Hybrid Model Mix

Use different models for different tasks:

Task	Model	Cost
Chat	GPT-4o Mini	$0.15/$0.60 per M tokens
Classification	Mini	$0.15/$0.60
Creative writing	GPT-4o	$2.50/$10.00
Math/reasoning	GPT-4.1	$2.00/$8.00

Most workloads can use Mini. Only 10-20% require premium models. Overall cost: 60-70% cheaper than all-Turbo.

Strategy 5: Caching (Preview)

Azure OpenAI Prompt Caching reduces costs for repeated prompts.

First request: Full cost
Subsequent requests (same prompt prefix): 90% discount on cached portion

Example: System prompt (1K tokens) + user prompt (100 tokens).

First request: 1100 tokens @ normal rate
Requests 2-100: 100 tokens @ normal rate + 1K tokens @ 10% cost (cached)

Savings compound for repetitive workloads (summarization, classification with fixed instructions).

Real-World Examples

Example 1: SaaS Content Generation Platform

A writing assistant generating summaries and rewrites for 10K daily active users.

Load:

10K users × 5 requests/day × 200 tokens avg = 10M tokens/day = 300M tokens/month
Split: 200M input, 100M output (GPT-4o)

On-Demand Cost:

Input: 200M × $2.50/M = $500
Output: 100M × $10.00/M = $1,000
Total: $1,500/month

PTU Cost:

Need: 10M tokens/day ÷ 1440 min = 6,944 TPM ≈ 7 PTUs
Cost: 7 × $40/hr × 730 = $204,400/month
Total: $204,400/month

Winner: On-demand. PTU is overkill. Only consider PTU if scale to 50K+ users or switch to batch processing.

Example 2: Production Legal Document Review

A law firm reviewing 100K documents/month (500 tokens each, 50 token extraction).

Load:

100K × 500 = 50M input tokens
100K × 50 = 5M output tokens = 55M tokens total/month

On-Demand Cost (GPT-4o Mini for extraction, GPT-4 for complex review):

Mini: 50M × $0.15 + 5M × $0.60 = $10,500
Total: $10,500/month

PTU Cost:

50M tokens ÷ 30 days ÷ 1440 min = 23 TPM ≈ 0.02 PTUs
Minimum: 1 PTU × $40/hr × 730 = $29,200/month
Total: $29,200/month

Alternative: Batch Processing

50M inputs × $0.15 × 50% (batch discount) = $3,750
5M outputs × $0.60 × 50% = $1,500
Total: $5,250/month (latency: up to 24 hours)

Winner: Batch processing. Save $5,250/month vs on-demand ($55K vs on-demand $10.5K), and only 24-hour wait.

Example 3: Multilingual Customer Support

A support platform handling 1M customer queries/month in 10 languages. Task: translate + classify intent.

Load:

1M queries × 200 tokens = 200M input
Classification + 50 token response = 50M output
Total: 250M tokens/month

Costs:

On-demand GPT-4o Mini: (200M × $0.15) + (50M × $0.60) = $30K + $30K = $60,000/month
PTU (250M ÷ 30 ÷ 1440 min = 174 TPM ≈ 0.2 PTUs, but min 1): $40/hr × 730 = $29,200/month

Winner: PTU. Saves $30.8K/month despite overkill on throughput (under-utilized). If volume grows to 2B tokens/month (8x), PTU becomes clear winner.

FAQ

How much can I save with PTU?

30-50% per token at high volume (2B+ tokens/month). Below 1B tokens/month, on-demand is cheaper (no PTU commitment).

Is Azure OpenAI slower than OpenAI API?

No. Same models, same latency. Azure hosting may vary slightly by region (EU regions might add 5-10ms), but negligible.

Can I use Azure OpenAI with on-demand billing?

Yes. Pay-as-you-go is the default. PTU is optional, only for high-volume production.

What happens if I exceed my PTU allocation?

Requests are throttled (delayed) until you add more PTU capacity or usage decreases. No overage charges. No hard rejection.

Can I cancel PTU anytime?

PTU requires 1-hour minimum commitment. After 1 hour, you can delete it anytime. Azure doesn't bill for unused PTU hours if deleted within the same hour.

Do I need to use Azure infrastructure elsewhere?

No. Azure OpenAI is standalone. You don't need to migrate other services to Azure to use Azure OpenAI (though integration is easier if you already use Azure).

What about RBAC and API key management?

Azure OpenAI integrates with Azure Active Directory (AD). You can assign roles (admin, user, reader) to team members. Finer-grained access control than OpenAI API's simple key model.

Can I use multiple regions for failover?

Yes. Set up API endpoints in multiple regions. Requests route to primary, fail over to secondary if primary is down. Adds latency but improves reliability.

How does data residency cost compare to OpenAI API?

Similar per-token rates. Azure advantage is compliance guarantees (GDPR, HIPAA), not raw cost. For regulated data, Azure is standard. For general use, OpenAI API is competitive.

Can I mix on-demand and PTU?

Yes. Use on-demand for bursty loads, PTU for predictable base load. Some teams run both simultaneously (hybrid).

Sources

Azure OpenAI Service Pricing
Azure OpenAI Provisioned Throughput Units
OpenAI API Pricing
Azure Regions and Data Residency
Azure Compliance and GDPR
Azure OpenAI Batch API Documentation
Azure Prompt Caching Documentation (preview as of March 2026)

Contents

Overview

Azure OpenAI Pricing Models

Pay-as-You-Go (On-Demand)

Provisioned Throughput Units (PTU)

PTU Costs

Understanding Throughput

Cost Structure

PTU Sizing

Pay-as-You-Go Costs

Per-Token Pricing

Overages and Limits

PTU vs On-Demand

Break-Even Analysis

Workload Types

Switching Between Models

Data Residency

Azure Advantage

Compliance Benefits

Cost Impact of Residency

Cost Optimization

Strategy 1: Right-Size PTU Reservation

Strategy 2: Use Cheaper Models When Possible

Strategy 3: Batch Processing

Strategy 4: Hybrid Model Mix

Strategy 5: Caching (Preview)

Real-World Examples

Example 1: SaaS Content Generation Platform

Example 2: Production Legal Document Review

Example 3: Multilingual Customer Support

FAQ

Related Resources

Sources