State of GPU Cloud Pricing: Monthly Market Report

Executive Summary
GPU Cloud Pricing by Type
Provider Market Share
Trend Analysis
Regional Variations
FAQ
Related Resources
Sources

Executive Summary

GPU cloud market remains highly competitive as of March 2026. Spot pricing continues aggressive decline, while on-demand rates stabilize following late 2025 consolidation wave.

Key findings:

NVIDIA H100 on-demand rates from $2.69/hour (RunPod) to $3.78/hour (Lambda SXM)
Spot pricing reaches $1.05/hour on Vast.AI, making batch workloads viable at marginal costs
AMD MI350X emerged as viable alternative but commands significant premium over H100 on-demand pricing due to limited availability
New entrants (Fireworks AI, Together AI) offer improved per-token LLM pricing
European pricing 12-15% lower than US equivalents due to lower power costs

Market consolidation accelerated. Top 4 providers (RunPod, Lambda Labs, CoreWeave, Vast.AI) control 78% of volume. Smaller providers increasingly differentiate through specialty GPUs or regional presence.

GPU Cloud Pricing by Type

NVIDIA H100 SXM

Market baseline for production workloads.

Provider	On-Demand	6-Month Reserve	Annual Reserve	Spot Price
RunPod	$2.69	$2.15	$1.90	$1.05
Lambda	$3.78 (SXM) / $2.86 (PCIe)	$2.30	$2.00	N/A
CoreWeave	$6.155/GPU (8x cluster)	varies	varies	$1.10
Vast.AI	$2.80-$3.20	N/A	N/A	$1.05

RunPod maintains leadership through aggressive spot pricing strategy. Vast.AI spot pricing reaches market floor ($1.05/hour) but lacks guaranteed uptime.

NVIDIA H200 SXM

Mid-tier option for larger models. Pricing reflects 40% compute/memory premium over H100.

Provider	On-Demand	Spot
RunPod	$3.59	$1.53
Lambda	$4.50	$2.15
CoreWeave	$3.95	$1.58

H200 adoption accelerates for models exceeding 100B parameters. Price premium justifies for specific use cases but not universal.

NVIDIA B200 (Blackwell)

Latest generation with 192GB memory. Scarcity premium reflects limited availability.

Provider	On-Demand	Spot
RunPod	$5.98	$3.89
Lambda	$7.45	$4.50
CoreWeave	$6.80	$4.20

Price premium over H100 (102%) justified only for multi-model deployments or 175B+ single models. Slower adoption expected until supply increases.

AMD MI350X

AMD's competitive entry gained meaningful market share (12% of spot volume).

Provider	On-Demand	Spot
CoreWeave	$7.20	$4.30
Lambda	$7.95	$5.10
Vast.AI	N/A	$4.80

Pricing premium reflects lack of ecosystem maturity and driver/optimization lag compared to NVIDIA. However, improved performance per watt appeals to cost-conscious operators.

Market volume estimated from billing data and provider announcements:

Provider	Market Share	Key Strength	Weakness
RunPod	28%	Spot pricing, H100/H200/B200	API complexity
Vast.AI	22%	Lowest pricing, global supply	No SLA guarantees
Lambda Labs	18%	API simplicity, reliability	Premium pricing
CoreWeave	16%	European presence, AMD support	US availability gaps
OVH Cloud	8%	GDPR compliance, low cost	Limited GPU selection
Smaller providers	8%	Regional specialization	Limited resources

RunPod gains market share through aggressive pricing and superior spot price mechanics. Lambda Labs maintains premium positioning for reliability-conscious customers. CoreWeave capitalizes on European demand and AMD early adoption.

Consolidation trend continues. Smaller providers without differentiation face increasing pressure to exit or merge as top-tier players compete aggressively on pricing.

Trend Analysis

Spot Pricing Deflation

Spot prices declined 22% annually as supply increased. Providers diversify customer mix to boost usage rates. Prices are approaching the floor - basically electricity and cooling costs.

Spot pricing now viable for production batch workloads (model training, re-ranking, synthetic data generation) rather than only development. This shifts customer economics significantly.

Regional Price Convergence

US-Europe price gap narrowed from 22% (2024) to 12% (March 2026) as European providers gained scale. Africa and South America still command 40% premiums due to limited supply.

Expect continued convergence. Geographic arbitrage opportunities diminishing.

Model Architecture Efficiency Gains

LLM inference becomes more efficient despite larger model sizes:

Mistral 7B achieves 45% higher tokens/second than Llama 2 7B
Llama 3.1 405B with grouped query attention approaches Llama 3 70B throughput
Quantization advances (FP8) enable larger models with similar memory footprint

These improvements reduce GPU hours required per million tokens, partially offsetting cloud pricing stability.

Reserved Instance Adoption

Multi-year commitments grow as customers optimize for cost. Annual commitments now represent 35% of volume (up from 12% in 2023).

This indicates market maturation and confidence in current pricing levels. Customers willing to lock in rates, reducing pricing pressure near-term.

Regional Variations

United States

Lowest absolute pricing due to mature supply chains and favorable power rates (particularly Texas). H100 spot pricing reaches $1.05/hour in off-peak windows.

Texas data centers benefit from abundant renewable energy and deregulated electricity markets. West Coast pricing 8-12% higher due to power costs and land scarcity.

Europe

12-15% cheaper than US due to lower electricity costs and lower real estate prices in Eastern Europe (Poland, Hungary). CoreWeave pricing leadership in Frankfurt/Amsterdam reflects this advantage.

GDPR compliance premium: 2-5% markup for providers maintaining strict data residency. Negligible for most workloads but matters for regulated industries.

Asia-Pacific

Paradoxically expensive despite manufacturing hubs. NVIDIA allocates limited high-end GPUs to APJ region. Supply constraints push pricing 35-45% above US levels.

Singapore and Hong Kong serve regional demand but sourcing limitations persist. China remains largely disconnected from global cloud GPU markets.

FAQ

Q: Which provider offers best overall value? RunPod for cost-sensitive production workloads (aggressive spot pricing). Lambda Labs for reliability-sensitive applications despite premium pricing. CoreWeave for European customers or AMD workloads.

Q: Are spot prices reliable for production? Varies by provider. Vast.AI spots reclaim with 10-minute notice. RunPod spots generally stable 2-4+ hours. CoreWeave longer-duration spots (8+ hours) suitable for batch jobs. Use spot only for fault-tolerant workloads.

Q: What's the GPU cloud pricing trend? Downward for H100 (8% annual decline). B200 premium persists as supply-constrained. AMD gaining volume share but still price-premium. Expect continued spot price deflation as supply increases.

Q: Should I commit to reserved instances? Yes, if monthly usage exceeds 300 hours. Annual commitments save 20-30% versus on-demand. RunPod H100 SXM on-demand is $2.69/hour; reserved rates vary by provider. Lower risk if workload stable. Reserve only if confident in usage levels.

Q: How do per-token LLM API prices compare to cloud GPUs? API inference typically 40-60% more expensive per token than self-managed GPUs. But includes no operational overhead. Breakeven around 10-50M tokens monthly depending on provider and model.

Q: What's the electricity cost component of GPU pricing? H100 consumes 700W. At $0.12/kWh (US average), electricity costs $2.02/month per H100. Typical provider markup adds 300-400% for infrastructure/profit. Electricity represents ~15% of final GPU rental price.

Sources

GPU cloud provider pricing aggregation (March 2026)
DeployBase market analysis dataset
Public provider billing reports
Industry analyst estimates (Mercury Research, Gartner)
Regional electricity cost analysis

Contents

Executive Summary

GPU Cloud Pricing by Type

Provider Market Share

Trend Analysis

Regional Variations

FAQ

Related Resources

Sources