Contents
Cheapest GPU Cloud: Ranking GPU Cloud Providers by Price (March 2026)
Cheapest GPU Cloud is the focus of this guide. Pricing varies wildly. Cheapest H100 provider might have crappy A100 rates. This breaks it down by GPU type so developers can pick smart.
RTX 4090 Pricing Comparison
The RTX 4090 dominates consumer-grade AI workloads. Costs for this GPU type:
RunPod: $0.34/hour - Best for consumer GPU category Lambda Labs: No RTX 4090 offering (enterprise-focused) CoreWeave: $0.42/hour Vast AI: $0.28-0.32/hour (peer-to-peer marketplace pricing, variable) AWS: $1.26/hour (p3.2xlarge alternative, significantly higher)
Winner for RTX 4090: Vast AI at the low end of their range, RunPod as the stable, supported option.
A100 SXM/PCIe Pricing
The A100 remains the workload standard for mid-to-large inference and training.
RunPod (A100 SXM): $1.39/hour - Most stable pricing Lambda Labs (A100): $1.48/hour - Slightly premium, includes support CoreWeave (8x A100 SXM): $21.60/hour ($2.70/GPU) - Multi-GPU bundles only AWS (p3.8xlarge = 4x A100 for $24.48): $6.12/hour per A100 - Prohibitively expensive Google Cloud (a2-highgpu-1g = 1x A100): $4.17/hour - Also expensive
Winner for A100 (single GPU): RunPod at $1.19/hour (PCIe) or $1.39/hour (SXM).
H100 Pricing
The H100 addresses higher-throughput inference and training.
RunPod: $1.99/hour (PCIe) / $2.69/hour (SXM) Lambda Labs: $3.78/hour CoreWeave: $49.24/hour (8x H100 cluster only, ~$6.16/GPU) - No single H100 option AWS (p4de.24xlarge = 8x H100 for $98.688): $12.34/hour per H100 - Far too expensive Vast AI: $2.20-2.60/hour (marketplace volatility)
Winner for H100 (single GPU): RunPod at $1.99/hour (PCIe) or $2.69/hour (SXM), Vast AI if developers accept marketplace volatility.
H200 Pricing
The H200 occupies the high-performance niche, less established across providers.
RunPod: $3.59/hour Lambda Labs: Not yet available (as of March 2026) Verda: $3.39/hour (SXM) Civo: $3.49/hour (SXM) AWS/Google Cloud: No direct offerings
Winner for H200: Verda at $3.39/hour.
B200 Pricing (Next-Gen)
B200 availability is limited, with pricing still settling.
Nebius: $5.50/hour RunPod: $5.98/hour Lambda Labs: $6.08/hour (SXM) Others: Limited availability
Winner for B200: Nebius at $5.50/hour.
L40S Pricing
The L40S serves inference and moderate training efficiently.
RunPod: $0.79/hour CoreWeave: $0.71/hour Lambda Labs: $0.89/hour AWS: $1.25-1.50/hour range
Winner for L40S: CoreWeave at $0.71/hour.
Overall Provider Ranking by Category
Best for General Inference (RTX 4090 / A100 / L40S):
- CoreWeave - Consistently competitive across GPU types
- RunPod - Stable pricing, excellent reliability
- Vast AI - Cheapest spot pricing, higher volatility risk
Best for High-Performance Workloads (H100 / H200 / B200):
- CoreWeave - Strong pricing across all high-end GPUs
- RunPod - Reliable alternative with slight premium
- Lambda Labs - Premium pricing but strong support
Best for Consistency and Reliability:
- RunPod - Uptime and support reputation solid
- Lambda Labs - Production SLA options available
- CoreWeave - Growing reliability track record
Best for Price-Sensitive Applications:
- Vast AI - Lowest spot prices but with marketplace risk
- CoreWeave - Best fixed pricing across catalog
- RunPod - Good middle ground between price and reliability
Cost Analysis by Workload Type
Single-Token Inference (Latency-Critical):
Real-time chat applications require minimal batch sizes. Pricing advantage goes to providers with lowest per-hour cost since developers're paying for uptime more than throughput.
Winner: CoreWeave's competitive hourly rates. Vertical scaling (larger single GPU) often better than horizontal scaling (multiple smaller GPUs) for latency workloads.
Batch Inference (Throughput-Critical):
Processing thousands of requests overnight or during off-peak hours. Cost-per-token becomes the optimization metric.
- A100 on RunPod: $1.19/hour = $0.000331/second
- H100 on RunPod: $2.69/hour = $0.000747/second
- H100 delivers roughly 2.2x throughput vs A100, so per-token cost is roughly equal
For batch inference, throughput scaling becomes economically neutral once developers account for hardware differences.
Training Workloads:
Extended compute-intensive jobs where total cost matters more than hourly rate.
A 10-day training run on RunPod's H100 ($2.69/hour):
- 10 days × 24 hours × $2.69 = $646.80
Same on Vast.AI (~$2.30/hour average):
- 10 days × 24 hours × $2.30 = $552.00
Savings: $94.80 (roughly 15% cost reduction, with marketplace variability). For large teams running continuous training, Vast AI savings can compound significantly (with reliability caveats).
Reliability and Hidden Costs
Pricing ranking omits crucial reliability factors:
Vast AI (Peer-to-Peer): Lowest published prices but higher interruption rates (5-15% depending on market conditions). Interruptions incur rescheduling costs and potentially lost progress on training workloads. Effective cost is higher when accounting for reliability tax.
RunPod: Consistently available with <1% interruption rate. Their managed marketplace model provides stability. Support responsiveness is reliable.
CoreWeave: Strong infrastructure-level commitment with improving availability track record. As a managed provider rather than marketplace, availability is more predictable.
Lambda Labs: Premium pricing reflects production-grade SLA guarantees and support. For mission-critical workloads, the reliability premium justifies cost.
AWS/Google Cloud: Highest sticker prices but with production support contracts, committed use discounts, and integration benefits if developers're already in those ecosystems. Rarely the cheapest option but offer ecosystem lock-in benefits.
Volume Discounts and Commitments
All providers offer discounts for committed usage:
RunPod: 20-25% discount for 3-month prepayment, 30-35% for annual commitments CoreWeave: 15-20% discount for monthly commitments, variable production contracts Lambda Labs: 25-30% discount for reserved capacity Vast AI: Limited discount structure (marketplace-driven pricing) AWS: 33-55% discount for 3-year commitments (but less valuable given higher base pricing)
For predictable 3+ month workloads, commit. For episodic or experimental work, spot pricing on RunPod or Vast AI is better.
Regional Pricing Variation
Pricing listed above assumes US-East regions. International availability varies:
Europe: CoreWeave offers competitive EU-West pricing, sometimes cheaper than US due to regional competition Asia-Pacific: Limited GPU cloud availability; AWS and Google Cloud dominate but at premium pricing Other US regions: West Coast pricing sometimes 5-10% higher due to lower capacity
Consider geographic distribution of the workloads when selecting providers.
Practical Selection Framework
For Cost Optimization: CoreWeave, then RunPod, then evaluate Vast AI if interruption tolerance is high.
For Reliability: RunPod or Lambda Labs despite price premium.
For Flexibility: RunPod offers the broadest GPU selection with consistent pricing.
For Enterprise: Lambda Labs or AWS for SLA guarantees.
For Experimentation: Vast AI for lowest marginal cost, or RunPod free tier to start.
Monitoring Price Changes
Bookmark GPU pricing to track ongoing price changes. Providers adjust pricing monthly based on demand and cost structures. Quarterly reviews ensure developers're not overpaying relative to current market rates.
For specific provider deep dives, check:
For hardware-level context, understand NVIDIA A100 pricing and NVIDIA H100 pricing to see cloud markups versus bare-metal costs.
Provider-Specific Strengths Beyond Pricing
CoreWeave Advantages Beyond Cost:
- Dedicated GPU cloud infrastructure (not general-purpose cloud)
- Better optimization for AI workloads than hyperscalers
- Growing ecosystem of integrations (Hugging Face, RunwayML, etc.)
- Expanding global data center presence improving latency
- Improving documentation and community support
RunPod Advantages Beyond Cost:
- Broadest GPU selection (consumer through production GPUs)
- Established community and extensive documentation
- Flexible provider ecosystem:choose the preferred service operator
- Excellent API and integration support
- Strong platform stability despite marketplace nature
Lambda Advantages Beyond Cost:
- Enterprise-grade SLA guarantees (99.95%)
- Priority support with guaranteed response times
- Simplified procurement for production customers
- Excellent integration with ML frameworks
- Unified billing and administration for teams
Vast AI Advantages Beyond Cost:
- Maximum flexibility for custom configurations
- P2P marketplace enabling unique hardware options
- Strong for development and experimentation phase
- Excellent for learning and prototyping
- Transparent pricing through auction model
Hidden Costs and True Cost of Ownership
Data Transfer Costs: AWS charges for data egress. A 100GB model pull costs $12 in transfer fees. CoreWeave, RunPod generally don't charge egress. Annual data transfer can be substantial at scale.
Setup and Configuration Time: Configuring instances varies. Vast AI requires more setup (higher operational burden). RunPod simplifies with templates. Lambda provides most standardization. Time cost: $500-2,000 per deployment.
Optimization Requirements: Cheaper providers require more optimization to achieve target performance. Engineering time tuning batch sizes, quantization, and configurations might cost $5K-20K for complex workloads. Premium providers often provide optimization consulting.
Monitoring and Reliability: Vast AI requires manual monitoring. Failure detection and failover require custom infrastructure. Premium providers include monitoring. Time cost: $500-2,000 monthly for reliable production operation.
Switching Costs: If developers later decide to move, redeploying to different infrastructure requires downtime and validation testing. Avoid lock-in by designing with multi-provider support from day one.
Reliability and Uptime Tracking
Provider reliability varies significantly:
RunPod: Generally 99.5% availability. Marketplace model introduces some variability but managed reputation system maintains quality.
CoreWeave: Improving reliability, currently around 99.2%. Dedicated infrastructure helps, but growing pains still present.
Lambda Labs: 99.95% SLA with guaranteed compensation for violations. Most reliable but at premium cost.
Vast AI: 95-97% availability. Peer-to-peer model introduces inherent variability. Acceptable for development, risky for production.
AWS/Google Cloud: 99.95%+ with formal SLAs. Most expensive but most reliable.
Calculate reliability cost: If the operation costs $10K/month and downtime costs $1K per minute, 1% unreliability = $600K annual downtime cost. For mission-critical operations, premium providers are justified despite high cost.
Geographical Considerations and Latency
Latency from users to GPU location is critical for interactive applications.
Global Distribution: RunPod has US-only availability. CoreWeave expanding to Europe. AWS/Google have global presence. For international users, latency might force premium providers.
Regional Pricing: US-East typically cheapest. West Coast 5-10% premium. Europe 10-20% premium. Asia-Pacific extremely limited availability.
Latency Impact: If model serving latency is 100ms in US-East but 300ms in Asia, user experience degrades. This might force dual-region deployment (expensive) or dedicated premium provider with local presence.
Seasonal Pricing Patterns
GPU prices fluctuate seasonally:
High-Demand Periods: Q4 (crypto mining, holiday ML projects), conference seasons see price increases 15-30%.
Low-Demand Periods: Q1, summer sees price decreases 10-20%.
Crypto Correlation: When crypto mining is profitable, GPU costs spike as miners demand capacity. Monitor crypto markets for indirect GPU price signals.
Device Availability: New GPU releases cause temporary price premiums for latest hardware. Waiting 3-6 months after release yields 20-30% discounts as capacity expands.
Timing capacity commitments around seasonal patterns yields 10-20% savings annually.
Advanced Optimization Strategies
Multi-Provider Load Balancing: Route requests to cheapest currently-available provider. Complexity added but 15-25% cost reduction possible at large scale.
Spot vs. Reserved Tradeoff: Reserve capacity for baseline load (guaranteed), burst to spot for peaks. Reduces cost while maintaining SLA guarantees.
Time-Shifting: If workloads can tolerate delays, shift to off-peak regions and times. Off-peak GPU costs 20-40% less. Overnight batch processing reduces costs significantly.
Model Optimization: Quantization, pruning, distillation reduce GPU requirements. 30-50% smaller models on cheaper hardware sometimes beat expensive hardware with standard models.
Consolidation: Combining multiple small jobs into single larger job improves utilization and reduces per-job overhead costs by 20-30%.
FAQ
Should I use the absolute cheapest provider even if less reliable?
Not unless interruptions are acceptable. Vast AI's 5-10% price advantage disappears if interruptions cause 10% overhead and lost work. Calculate true cost including resilience and retry expenses. For production workloads, RunPod or CoreWeave are worth the premium. For development, Vast AI is optimal.
How do I account for different GPU capabilities in pricing?
Compare cost-per-operation for your specific models through benchmarking. An H100 costs 2x an A100 hourly but delivers 2.2x throughput on LLMs, making per-token costs similar. Test with your actual workload, not theoretical specs. Measured performance is what matters.
Are there geographic regions with significantly cheaper GPUs?
Yes. US-West costs 10-15% more than US-East. Europe costs 15-25% more. Asia-Pacific is extremely expensive. However, latency to users might negate savings. Measure latency impact:added latency costs user satisfaction and reduced conversion, often exceeding regional price savings.
Is it worth committing annually to save 30-35%?
Only for workloads verified to be consistent for 12 months. Test utilization for 3-4 weeks before committing. If usage fluctuates, spot pricing provides necessary flexibility. Committing locks in costs that might decline through competitive pressure.
What about self-hosting versus cloud providers?
Self-hosting requires $5K-50K CapEx for hardware plus operational overhead. Cloud is optimal if you need <6 months deployment or <$500/month spend. Beyond that, self-hosting becomes competitive. At $5K+/month spend, self-hosting usually wins financially. Calculate break-even: (hardware cost + setup) / (monthly cloud cost - monthly self-hosted cost).
Can I burst across multiple providers for peak demand?
Yes, that's an advanced optimization strategy. Code once with cloud-agnostic APIs (Kubernetes, Ray), then burst to cheapest available capacity. Requires infrastructure complexity but yields 15-30% cost savings at large scale by avoiding premium providers for baseline capacity.
Related Resources
- GPU Pricing Tracker - Weekly price monitoring methodology
- Best GPU for LLM Inference - Hardware selection guide
- Best GPU for AI Training - Training workload optimization
- NVIDIA A100 Price - Hardware cost reference
- NVIDIA H100 Price - High-end GPU baseline
Sources
- RunPod, CoreWeave, Lambda Labs, Vast AI official pricing (as of March 2026)
- AWS and Google Cloud pricing calculators (as of March 2026)
- DeployBase.AI provider reliability database (as of March 2026)
- Community reports on provider uptime and interruption rates (2025-2026)
- Comparative cost analysis from infrastructure optimization studies