NVIDIA H100 Price: Cloud GPU Rental Rates Compared (2026)

Deploybase · June 10, 2025 · GPU Pricing

Contents


NVIDIA H100 Price Overview

The NVIDIA H100 price for cloud rentals runs anywhere from $1.38 to $11.68 per GPU-hour as of March 2026. That's an 8.5x spread across 28+ providers tracked on the DeployBase GPU dashboard.

Why so wide? Boutique providers like ThunderCompute and Latitude sit under $2.00/hr with minimal overhead. AWS charges $6.88. Google Cloud charges $11.68 for on-demand SXM. Same chip, wildly different price tags. The gap comes down to form factor, managed services, SLAs, and how much infrastructure the provider bundles in.

Buying outright: $25,000 to $40,000 depending on configuration. H200 and B200 inventory is growing and pushing H100 rates lower. Full specs on the H100 model page.


Cloud Provider Pricing

All prices below are on-demand per-GPU-hour rates tracked by DeployBase as of March 21, 2026. Sorted cheapest to most expensive.

Provider$/GPU-hrForm FactorGPU CountSource
ThunderCompute$1.38H1001xthundercompute.com
Latitude$1.66H1001xlatitude.sh
Hyperstack$1.95H1001xhyperstack.cloud
RunPod$1.99H100 PCIe1xrunpod.io
Verda$2.29H100 SXM8xverda.com
Hyperstack$2.40H100 SXM1xhyperstack.cloud
Voltage Park$2.49H100 SXM8xvoltagepark.com
Civo$2.49H100 PCIe8xcivo.com
RunPod$2.59H100 NVL1xrunpod.io
RunPod$2.69H100 SXM1xrunpod.io
Lambda$2.86H100 PCIe1xlambda.AI
ORI$2.90H100 PCIe1xori.co
ORI$2.90H100 SXM1xori.co
Nebius$2.95H1001xnebius.com
DigitalOcean$2.99H1008xdigitalocean.com
Vultr$2.99H1008xvultr.com
Lambda$3.78H100 SXM4xlambda.AI
Crusoe$3.90H1001xcrusoe.AI
Paperspace$5.95H1001xpaperspace.com
CoreWeave$6.16H1008xcoreweave.com
AWS$6.88H1001xaws.amazon.com
Azure$6.98H100 NVL2xazure.microsoft.com
Azure$11.06H1008xazure.microsoft.com
Google Cloud$11.68H100 SXM8xcloud.google.com

The cheapest H100 rental is 8.5x cheaper than the most expensive. Provider selection matters more than almost any other cost lever.

A few notes on reading this table. Multi-GPU instances (8x) bundle CPU, RAM, networking, and NVLink interconnect into the total price, so the per-GPU cost runs higher than bare single-GPU rental. ThunderCompute's $1.38/hr uses a virtual GPU pricing model with shared infrastructure and lower uptime guarantees than dedicated providers. The hyperscaler rates (AWS, Azure, GCP) include managed service overhead, SLAs, compliance infrastructure, and integrated networking.

The mid-market sweet spot sits around $2.00 to $3.00/GPU-hour. Providers in that range (RunPod, Hyperstack, Lambda, ORI, Nebius, Civo) offer dedicated GPU access with reasonable SLAs and enough operational maturity for production workloads. Below $2.00, tradeoffs in uptime, support, and feature set start to appear. Above $5.00, teams are paying for managed infrastructure services rather than raw GPU compute.


Form Factors

Three H100 configurations ship. Pricing and performance gaps between them are large enough to matter for cost planning.

H100 PCIe (80GB)

The most common variant. 80GB HBM2e memory with 2.0 TB/s bandwidth at 350W TDP. Fits standard PCIe slots, works in mixed server builds, and requires no special chassis. That's why most boutique providers run PCIe: integration is straightforward and the power draw is manageable.

Cloud rates range from $1.38 to $6.88/GPU-hour. Best for inference and single-GPU workloads. The bandwidth ceiling (2.0 TB/s vs 3.35 TB/s on SXM) becomes a real bottleneck when training across multiple GPUs.

H100 SXM5 (80GB)

This is what every hyperscaler training cluster runs. 80GB HBM3, 3.35 TB/s bandwidth, up to 700W TDP. The SXM form factor requires NVIDIA's DGX or HGX baseboard, but in return, eight GPUs connect via NVLink at 900 GB/s per GPU with up to 57.6 TB/s aggregate bandwidth across 256 GPUs using NVLink Switch.

Cloud rates: $2.29 to $11.68/GPU-hour, roughly 20-50% more than PCIe. That premium buys multi-GPU training throughput. For single-GPU inference, it's wasted money.

H100 NVL (94GB per die, 188GB paired)

Different animal. Two H100 dies paired with a 600 GB/s NVLink bridge, giving 188GB HBM3 combined and 3.9 TB/s memory bandwidth at 400W TDP per die.

Purpose-built for large model inference. Models that exceed 80GB VRAM run on a single NVL card without model parallelism overhead. RunPod lists H100 NVL at $2.59/GPU-hour, Azure at $6.98/GPU-hour. Availability is more limited than PCIe or SXM across most providers.


Instance Types

On-Demand

No commitment, no discount, maximum flexibility. Monthly cost at 730 hours continuous:

ThunderCompute at $1.38/hr runs $1,007/month. RunPod PCIe at $1.99/hr: $1,453/month. AWS at $6.88/hr: $5,022/month. That spread is over $4,000/month for the same GPU architecture. Compare rates side-by-side on the GPU pricing dashboard.

Spot and Preemptible

Spot pricing typically runs 30-60% below on-demand. The tradeoff: instances can be reclaimed with short notice (usually 2 minutes on AWS). Workloads need checkpointing.

RunPod spot H100 PCIe: reported as low as $0.99/hr. AWS spot P5: $2.10 to $2.50/hr historically. GCP preemptible A3: roughly 60-91% discount off on-demand per Google's published discount structure.

Good for training with checkpoint support, batch preprocessing, and research experiments. Bad for production inference with uptime requirements. The key question is whether the workload can tolerate interruption. If it can checkpoint and restart cleanly, spot pricing is the obvious play. If it can't, the risk of lost compute time outweighs the hourly savings.

Reserved and Committed

Locking in for 1 to 3 years drops rates 30-60% off on-demand.

CoreWeave offers up to 60% discount on committed usage. Google Cloud committed use discounts (CUDs) run 55-65% off on-demand depending on term and region. AWS reserved instances typically discount 30-40% for 1-year and 40-50% for 3-year terms.

The risk: locking into 3-year H100 commitments while H200 and B200 pricing drops. A 3-year reserved instance signed in March 2026 runs through early 2029, by which point next-generation GPUs will likely offer better cost-per-FLOP at lower hourly rates. One-year commitments for steady workloads with on-demand for overflow is the safer structure until the next-gen pricing picture clarifies.


Purchase Costs

Individual GPUs

ConfigStreet PriceBulk (10+ units)
H100 PCIe (80GB)$25,000 to $30,000$24,000 to $28,000
H100 SXM (80GB)$35,000 to $40,000$22,000+ at volume
H100 NVL (94GB per die)~$29,000 per dievaries

Volume discounts of 10-20% are typical at 8+ units through authorized NVIDIA resellers. SXM pricing has dropped more aggressively than PCIe as supply normalized through late 2025 and early 2026.

Complete Systems

The DGX H100 (8x SXM GPUs with NVLink, chassis, networking): $300,000 to $460,000 depending on configuration and reseller. Custom 8-GPU PCIe servers: $220,000 to $280,000 before power and cooling infrastructure.

Each SXM GPU pulls up to 700W. A full 8-GPU node draws 5.6 kW under load, which translates to roughly $5,900/year in electricity at $0.12/kWh running 24/7. Add $10,000 to $100,000 for power and cooling infrastructure depending on existing facility readiness. Liquid cooling is increasingly common for SXM deployments at this thermal density.


Buy vs Rent

The Math

At the median cloud rate of $2.69/hr (RunPod SXM, from DeployBase data):

Monthly (730 hrs): $1,964. Annual: $23,567. Three years: $70,700.

An H100 PCIe purchased at $27,000 plus 5 years of power (350W continuous at $0.12/kWh = ~$1,840/year) and cooling ($2,000 to $10,000 amortized) totals roughly $36,000 to $46,000 over five years.

Breakeven lands around 14,000 to 17,000 hours of continuous use. At 24/7 operation, that's 19 to 23 months. Add maintenance, staff time, and real-world overhead and practical breakeven pushes to 24 to 30 months.

When to Rent

Utilization under 60%. Projects shorter than 18 months. Spiky or seasonal demand. No existing datacenter. Capital constrained.

When to Buy

Continuous utilization above 70%. Three-year-plus deployment horizon. On-prem infrastructure already in place. Steady-state workload (training clusters, always-on inference).


Use Case Costs

LLM Training (7B parameters, 8x SXM GPUs, 7 days)

ProviderPer-GPU Rate8-GPU Total (168 hrs)
Verda SXM$2.29/hr$3,078
RunPod SXM$2.69/hr$3,615
Lambda SXM$3.78/hr$5,077

SXM form factor is required. NVLink interconnect is needed for model parallelism at this scale. PCIe bandwidth (2.0 TB/s) cannot keep up.

Fine-Tuning (LoRA, single GPU, 15 hours)

ProviderRateTotal
ThunderCompute$1.38/hr$20.70
RunPod PCIe$1.99/hr$29.85
Lambda PCIe$2.86/hr$42.90

LoRA and QLoRA drop memory needs from 80GB to 16-24GB. An A100 handles many fine-tuning jobs at lower hourly rates. The H100 finishes faster but costs more per hour.

Inference Serving (1M tokens/day, ~2 hrs GPU/day)

At RunPod's NVL rate ($2.59/hr): roughly $155/month. H100 NVL is the optimal form factor here. The 188GB combined memory and inference-tuned architecture handle LLM serving more efficiently than PCIe or SXM on a cost-per-token basis.


Cost Optimization

The same H100 SXM costs $2.29/GPU-hour on Verda and $11.68/GPU-hour on Google Cloud. Provider selection is the single biggest cost lever. Before tuning anything else, check whether a different provider offers the same GPU at half the price. Most of the time, the answer is yes.

Spot scheduling works well for fault-tolerant workloads. The pattern: run a baseline on reserved capacity, burst to spot for overflow. AWS spot H100s have historically run $2.10 to $2.50/hr, compared to $6.88 on-demand. Typical monthly savings: 30-50%. The catch is 2-minute eviction warnings, so checkpoint early and often.

Then there's right-sizing. Not every job needs SXM. PCIe handles inference and single-GPU work at 20-50% less per hour. A team running inference on SXM at $2.69/hr could switch to PCIe at $1.99/hr and save 26% with zero performance loss on that workload. SXM only earns its premium when NVLink multi-GPU interconnect actually gets used.

Parameter-efficient fine-tuning cuts costs a different way: by reducing the GPU tier needed. LoRA and QLoRA drop memory requirements from 80GB to 16-24GB. Full fine-tune of Mistral 7B on an H100 at $2.69/hr for 20 hours costs $53.80. Same model via LoRA on a RunPod A100 at $1.39/hr for 20 hours: $27.80. Half the cost, similar output quality.

The commitment trap is real now. H200 (141GB HBM3e) and B200 (192GB HBM3e) are both available on cloud providers. Locking into 3-year H100 reserved instances means paying last-generation rates while next-generation hardware enters the market at competitive pricing. One-year terms on steady workloads, on-demand for everything else.


Pricing Outlook

AWS cut P5 on-demand pricing 44% in June 2025. Within 2 to 4 weeks, competitors matched. That domino effect is likely to repeat as newer GPUs gain cloud traction through 2026.

The H200 is already live. RunPod lists it at $3.59/GPU-hour with 141GB HBM3e memory, nearly double the H100's 80GB. As more providers roll out H200 inventory, H100 on-demand rates face direct downward pressure. Teams that need more than 80GB VRAM per GPU will shift to H200, freeing up H100 supply and pushing prices lower.

B200 availability is more limited. RunPod charges $5.98/GPU-hour, Lambda $6.08/GPU-hour for B200 SXM, and CoreWeave $8.60/GPU-hour. Still early. Volume availability is expected to ramp through Q3-Q4 2026, at which point B200 pricing will put additional pressure on both H100 and H200 rates.

No H100 supply shortage is expected through 2026. Lead times have stabilized at 2 to 4 weeks through most channels. The pricing floor for boutique H100 rentals sits around $1.38 to $2.00/GPU-hour. Below that, operational costs (power, cooling, depreciation, networking) make it hard for providers to offer lower rates and remain viable.

For teams planning GPU budgets: expect H100 on-demand rates to drift 10-20% lower by Q4 2026 as H200 and B200 inventory grows. Spot rates will compress further. Reserved commitments longer than 1 year carry increasing risk of overpaying relative to newer hardware.


FAQ

What is the cheapest H100 cloud rental? ThunderCompute at $1.38/GPU-hour and Latitude at $1.66/GPU-hour are the lowest on-demand rates tracked on DeployBase as of March 2026. RunPod PCIe at $1.99/hr is the most well-known budget option. Availability and uptime SLAs vary across smaller providers.

Should teams buy or rent H100 GPUs? Rent if utilization stays below 60% or the project runs shorter than 18 months. Buy if continuous utilization exceeds 70% over 3+ years with existing datacenter infrastructure. Breakeven is roughly 19 to 23 months at 24/7 operation.

What is the difference between H100 PCIe and SXM? PCIe uses HBM2e at 2.0 TB/s bandwidth in standard server slots ($1.38 to $6.88/hr). SXM uses HBM3 at 3.35 TB/s with 900 GB/s NVLink for multi-GPU training ($2.29 to $11.68/hr). Use PCIe for inference. Use SXM for distributed training.

Why are hyperscaler H100 rates so much higher? AWS ($6.88/GPU-hr), Azure ($6.98 to $11.06), and Google Cloud ($11.68) bundle managed infrastructure, SLAs, networking, and compliance into their pricing. Boutique providers offer the bare GPU with minimal overhead. The 3x to 5x price gap reflects service level, not GPU quality.

How much does it cost to fine-tune a model on H100? LoRA fine-tuning a 7B parameter model takes roughly 15 hours on a single GPU. At $1.38 to $2.86/hr, that's $21 to $43. Full fine-tuning at 7B scale runs $1,000 to $3,000 on multi-GPU setups. Parameter-efficient methods cut costs 50-70%.



Sources