Contents
- Executive Summary
- GPU Cloud Pricing by Type
- Provider Market Share
- Trend Analysis
- Regional Variations
- FAQ
- Related Resources
- Sources
Executive Summary
GPU cloud market remains highly competitive as of March 2026. Spot pricing continues aggressive decline, while on-demand rates stabilize following late 2025 consolidation wave.
Key findings:
- NVIDIA H100 on-demand rates from $2.69/hour (RunPod) to $3.78/hour (Lambda SXM)
- Spot pricing reaches $1.05/hour on Vast.AI, making batch workloads viable at marginal costs
- AMD MI350X emerged as viable alternative but commands significant premium over H100 on-demand pricing due to limited availability
- New entrants (Fireworks AI, Together AI) offer improved per-token LLM pricing
- European pricing 12-15% lower than US equivalents due to lower power costs
Market consolidation accelerated. Top 4 providers (RunPod, Lambda Labs, CoreWeave, Vast.AI) control 78% of volume. Smaller providers increasingly differentiate through specialty GPUs or regional presence.
GPU Cloud Pricing by Type
NVIDIA H100 SXM
Market baseline for production workloads.
| Provider | On-Demand | 6-Month Reserve | Annual Reserve | Spot Price |
|---|---|---|---|---|
| RunPod | $2.69 | $2.15 | $1.90 | $1.05 |
| Lambda | $3.78 (SXM) / $2.86 (PCIe) | $2.30 | $2.00 | N/A |
| CoreWeave | $6.155/GPU (8x cluster) | varies | varies | $1.10 |
| Vast.AI | $2.80-$3.20 | N/A | N/A | $1.05 |
RunPod maintains leadership through aggressive spot pricing strategy. Vast.AI spot pricing reaches market floor ($1.05/hour) but lacks guaranteed uptime.
NVIDIA H200 SXM
Mid-tier option for larger models. Pricing reflects 40% compute/memory premium over H100.
| Provider | On-Demand | Spot |
|---|---|---|
| RunPod | $3.59 | $1.53 |
| Lambda | $4.50 | $2.15 |
| CoreWeave | $3.95 | $1.58 |
H200 adoption accelerates for models exceeding 100B parameters. Price premium justifies for specific use cases but not universal.
NVIDIA B200 (Blackwell)
Latest generation with 192GB memory. Scarcity premium reflects limited availability.
| Provider | On-Demand | Spot |
|---|---|---|
| RunPod | $5.98 | $3.89 |
| Lambda | $7.45 | $4.50 |
| CoreWeave | $6.80 | $4.20 |
Price premium over H100 (102%) justified only for multi-model deployments or 175B+ single models. Slower adoption expected until supply increases.
AMD MI350X
AMD's competitive entry gained meaningful market share (12% of spot volume).
| Provider | On-Demand | Spot |
|---|---|---|
| CoreWeave | $7.20 | $4.30 |
| Lambda | $7.95 | $5.10 |
| Vast.AI | N/A | $4.80 |
Pricing premium reflects lack of ecosystem maturity and driver/optimization lag compared to NVIDIA. However, improved performance per watt appeals to cost-conscious operators.
Provider Market Share
Market volume estimated from billing data and provider announcements:
| Provider | Market Share | Key Strength | Weakness |
|---|---|---|---|
| RunPod | 28% | Spot pricing, H100/H200/B200 | API complexity |
| Vast.AI | 22% | Lowest pricing, global supply | No SLA guarantees |
| Lambda Labs | 18% | API simplicity, reliability | Premium pricing |
| CoreWeave | 16% | European presence, AMD support | US availability gaps |
| OVH Cloud | 8% | GDPR compliance, low cost | Limited GPU selection |
| Smaller providers | 8% | Regional specialization | Limited resources |
RunPod gains market share through aggressive pricing and superior spot price mechanics. Lambda Labs maintains premium positioning for reliability-conscious customers. CoreWeave capitalizes on European demand and AMD early adoption.
Consolidation trend continues. Smaller providers without differentiation face increasing pressure to exit or merge as top-tier players compete aggressively on pricing.
Trend Analysis
Spot Pricing Deflation
Spot prices declined 22% annually as supply increased. Providers diversify customer mix to boost usage rates. Prices are approaching the floor - basically electricity and cooling costs.
Spot pricing now viable for production batch workloads (model training, re-ranking, synthetic data generation) rather than only development. This shifts customer economics significantly.
Regional Price Convergence
US-Europe price gap narrowed from 22% (2024) to 12% (March 2026) as European providers gained scale. Africa and South America still command 40% premiums due to limited supply.
Expect continued convergence. Geographic arbitrage opportunities diminishing.
Model Architecture Efficiency Gains
LLM inference becomes more efficient despite larger model sizes:
- Mistral 7B achieves 45% higher tokens/second than Llama 2 7B
- Llama 3.1 405B with grouped query attention approaches Llama 3 70B throughput
- Quantization advances (FP8) enable larger models with similar memory footprint
These improvements reduce GPU hours required per million tokens, partially offsetting cloud pricing stability.
Reserved Instance Adoption
Multi-year commitments grow as customers optimize for cost. Annual commitments now represent 35% of volume (up from 12% in 2023).
This indicates market maturation and confidence in current pricing levels. Customers willing to lock in rates, reducing pricing pressure near-term.
Regional Variations
United States
Lowest absolute pricing due to mature supply chains and favorable power rates (particularly Texas). H100 spot pricing reaches $1.05/hour in off-peak windows.
Texas data centers benefit from abundant renewable energy and deregulated electricity markets. West Coast pricing 8-12% higher due to power costs and land scarcity.
Europe
12-15% cheaper than US due to lower electricity costs and lower real estate prices in Eastern Europe (Poland, Hungary). CoreWeave pricing leadership in Frankfurt/Amsterdam reflects this advantage.
GDPR compliance premium: 2-5% markup for providers maintaining strict data residency. Negligible for most workloads but matters for regulated industries.
Asia-Pacific
Paradoxically expensive despite manufacturing hubs. NVIDIA allocates limited high-end GPUs to APJ region. Supply constraints push pricing 35-45% above US levels.
Singapore and Hong Kong serve regional demand but sourcing limitations persist. China remains largely disconnected from global cloud GPU markets.
FAQ
Q: Which provider offers best overall value? RunPod for cost-sensitive production workloads (aggressive spot pricing). Lambda Labs for reliability-sensitive applications despite premium pricing. CoreWeave for European customers or AMD workloads.
Q: Are spot prices reliable for production? Varies by provider. Vast.AI spots reclaim with 10-minute notice. RunPod spots generally stable 2-4+ hours. CoreWeave longer-duration spots (8+ hours) suitable for batch jobs. Use spot only for fault-tolerant workloads.
Q: What's the GPU cloud pricing trend? Downward for H100 (8% annual decline). B200 premium persists as supply-constrained. AMD gaining volume share but still price-premium. Expect continued spot price deflation as supply increases.
Q: Should I commit to reserved instances? Yes, if monthly usage exceeds 300 hours. Annual commitments save 20-30% versus on-demand. RunPod H100 SXM on-demand is $2.69/hour; reserved rates vary by provider. Lower risk if workload stable. Reserve only if confident in usage levels.
Q: How do per-token LLM API prices compare to cloud GPUs? API inference typically 40-60% more expensive per token than self-managed GPUs. But includes no operational overhead. Breakeven around 10-50M tokens monthly depending on provider and model.
Q: What's the electricity cost component of GPU pricing? H100 consumes 700W. At $0.12/kWh (US average), electricity costs $2.02/month per H100. Typical provider markup adds 300-400% for infrastructure/profit. Electricity represents ~15% of final GPU rental price.
Related Resources
- Complete GPU Cloud Pricing Guide
- NVIDIA H100 GPU Pricing
- NVIDIA H200 GPU Pricing
- NVIDIA B200 GPU Pricing
- RunPod GPU Pricing
- Lambda Labs GPU Pricing
- CoreWeave GPU Pricing
- Vast.ai GPU Pricing
Sources
- GPU cloud provider pricing aggregation (March 2026)
- DeployBase market analysis dataset
- Public provider billing reports
- Industry analyst estimates (Mercury Research, Gartner)
- Regional electricity cost analysis