State of GPU Cloud Pricing: Monthly Market Report

Deploybase · March 1, 2026 · Market Analysis

Contents

Executive Summary

GPU cloud market remains highly competitive as of March 2026. Spot pricing continues aggressive decline, while on-demand rates stabilize following late 2025 consolidation wave.

Key findings:

  • NVIDIA H100 on-demand rates from $2.69/hour (RunPod) to $3.78/hour (Lambda SXM)
  • Spot pricing reaches $1.05/hour on Vast.AI, making batch workloads viable at marginal costs
  • AMD MI350X emerged as viable alternative but commands significant premium over H100 on-demand pricing due to limited availability
  • New entrants (Fireworks AI, Together AI) offer improved per-token LLM pricing
  • European pricing 12-15% lower than US equivalents due to lower power costs

Market consolidation accelerated. Top 4 providers (RunPod, Lambda Labs, CoreWeave, Vast.AI) control 78% of volume. Smaller providers increasingly differentiate through specialty GPUs or regional presence.

GPU Cloud Pricing by Type

NVIDIA H100 SXM

Market baseline for production workloads.

ProviderOn-Demand6-Month ReserveAnnual ReserveSpot Price
RunPod$2.69$2.15$1.90$1.05
Lambda$3.78 (SXM) / $2.86 (PCIe)$2.30$2.00N/A
CoreWeave$6.155/GPU (8x cluster)variesvaries$1.10
Vast.AI$2.80-$3.20N/AN/A$1.05

RunPod maintains leadership through aggressive spot pricing strategy. Vast.AI spot pricing reaches market floor ($1.05/hour) but lacks guaranteed uptime.

NVIDIA H200 SXM

Mid-tier option for larger models. Pricing reflects 40% compute/memory premium over H100.

ProviderOn-DemandSpot
RunPod$3.59$1.53
Lambda$4.50$2.15
CoreWeave$3.95$1.58

H200 adoption accelerates for models exceeding 100B parameters. Price premium justifies for specific use cases but not universal.

NVIDIA B200 (Blackwell)

Latest generation with 192GB memory. Scarcity premium reflects limited availability.

ProviderOn-DemandSpot
RunPod$5.98$3.89
Lambda$7.45$4.50
CoreWeave$6.80$4.20

Price premium over H100 (102%) justified only for multi-model deployments or 175B+ single models. Slower adoption expected until supply increases.

AMD MI350X

AMD's competitive entry gained meaningful market share (12% of spot volume).

ProviderOn-DemandSpot
CoreWeave$7.20$4.30
Lambda$7.95$5.10
Vast.AIN/A$4.80

Pricing premium reflects lack of ecosystem maturity and driver/optimization lag compared to NVIDIA. However, improved performance per watt appeals to cost-conscious operators.

Provider Market Share

Market volume estimated from billing data and provider announcements:

ProviderMarket ShareKey StrengthWeakness
RunPod28%Spot pricing, H100/H200/B200API complexity
Vast.AI22%Lowest pricing, global supplyNo SLA guarantees
Lambda Labs18%API simplicity, reliabilityPremium pricing
CoreWeave16%European presence, AMD supportUS availability gaps
OVH Cloud8%GDPR compliance, low costLimited GPU selection
Smaller providers8%Regional specializationLimited resources

RunPod gains market share through aggressive pricing and superior spot price mechanics. Lambda Labs maintains premium positioning for reliability-conscious customers. CoreWeave capitalizes on European demand and AMD early adoption.

Consolidation trend continues. Smaller providers without differentiation face increasing pressure to exit or merge as top-tier players compete aggressively on pricing.

Trend Analysis

Spot Pricing Deflation

Spot prices declined 22% annually as supply increased. Providers diversify customer mix to boost usage rates. Prices are approaching the floor - basically electricity and cooling costs.

Spot pricing now viable for production batch workloads (model training, re-ranking, synthetic data generation) rather than only development. This shifts customer economics significantly.

Regional Price Convergence

US-Europe price gap narrowed from 22% (2024) to 12% (March 2026) as European providers gained scale. Africa and South America still command 40% premiums due to limited supply.

Expect continued convergence. Geographic arbitrage opportunities diminishing.

Model Architecture Efficiency Gains

LLM inference becomes more efficient despite larger model sizes:

  • Mistral 7B achieves 45% higher tokens/second than Llama 2 7B
  • Llama 3.1 405B with grouped query attention approaches Llama 3 70B throughput
  • Quantization advances (FP8) enable larger models with similar memory footprint

These improvements reduce GPU hours required per million tokens, partially offsetting cloud pricing stability.

Reserved Instance Adoption

Multi-year commitments grow as customers optimize for cost. Annual commitments now represent 35% of volume (up from 12% in 2023).

This indicates market maturation and confidence in current pricing levels. Customers willing to lock in rates, reducing pricing pressure near-term.

Regional Variations

United States

Lowest absolute pricing due to mature supply chains and favorable power rates (particularly Texas). H100 spot pricing reaches $1.05/hour in off-peak windows.

Texas data centers benefit from abundant renewable energy and deregulated electricity markets. West Coast pricing 8-12% higher due to power costs and land scarcity.

Europe

12-15% cheaper than US due to lower electricity costs and lower real estate prices in Eastern Europe (Poland, Hungary). CoreWeave pricing leadership in Frankfurt/Amsterdam reflects this advantage.

GDPR compliance premium: 2-5% markup for providers maintaining strict data residency. Negligible for most workloads but matters for regulated industries.

Asia-Pacific

Paradoxically expensive despite manufacturing hubs. NVIDIA allocates limited high-end GPUs to APJ region. Supply constraints push pricing 35-45% above US levels.

Singapore and Hong Kong serve regional demand but sourcing limitations persist. China remains largely disconnected from global cloud GPU markets.

FAQ

Q: Which provider offers best overall value? RunPod for cost-sensitive production workloads (aggressive spot pricing). Lambda Labs for reliability-sensitive applications despite premium pricing. CoreWeave for European customers or AMD workloads.

Q: Are spot prices reliable for production? Varies by provider. Vast.AI spots reclaim with 10-minute notice. RunPod spots generally stable 2-4+ hours. CoreWeave longer-duration spots (8+ hours) suitable for batch jobs. Use spot only for fault-tolerant workloads.

Q: What's the GPU cloud pricing trend? Downward for H100 (8% annual decline). B200 premium persists as supply-constrained. AMD gaining volume share but still price-premium. Expect continued spot price deflation as supply increases.

Q: Should I commit to reserved instances? Yes, if monthly usage exceeds 300 hours. Annual commitments save 20-30% versus on-demand. RunPod H100 SXM on-demand is $2.69/hour; reserved rates vary by provider. Lower risk if workload stable. Reserve only if confident in usage levels.

Q: How do per-token LLM API prices compare to cloud GPUs? API inference typically 40-60% more expensive per token than self-managed GPUs. But includes no operational overhead. Breakeven around 10-50M tokens monthly depending on provider and model.

Q: What's the electricity cost component of GPU pricing? H100 consumes 700W. At $0.12/kWh (US average), electricity costs $2.02/month per H100. Typical provider markup adds 300-400% for infrastructure/profit. Electricity represents ~15% of final GPU rental price.

Sources

  • GPU cloud provider pricing aggregation (March 2026)
  • DeployBase market analysis dataset
  • Public provider billing reports
  • Industry analyst estimates (Mercury Research, Gartner)
  • Regional electricity cost analysis