NVIDIA B200 GPU Hourly Rental Price: Where to Rent

Deploybase · March 18, 2026 · GPU Pricing

Contents

NVIDIA B200 Overview

The NVIDIA B200 GPU represents the latest iteration in NVIDIA's data center accelerator lineup. Released in 2024, it delivers increased memory capacity (192GB HBM3e) and improved compute density compared to H100 and H200 variants.

Key specifications:

  • Memory: 192GB HBM3e (112GB more than H100)
  • Memory Bandwidth: 8.0 TB/s (2.4x H100's 3.35 TB/s)
  • Compute: ~9 PFLOPS (FP8)
  • Architecture: Blackwell (5nm)
  • TDP: ~1,050W

The B200 targets applications with extreme memory requirements: large language model inference at massive scale, multi-model deployments, and complex scientific computing workloads.

Hourly Rental Pricing

Current Market Rates (as of March 2026):

RunPod leads in B200 pricing at $5.98/hour for on-demand instances. This represents a 15% premium over H200 ($3.59/hour) reflecting B200's scarcity and advanced specifications.

Provider Pricing Summary:

ProviderPrice/HourMin CommitmentSpot Discount
RunPod$5.98On-demand35%
Lambda Labs$6.08On-demandN/A
CoreWeave$8.60 (per GPU)On-demandN/A
Vast.AI$4.20-$6.50Per-jobN/A

Lambda Labs charges $6.08/hour for B200 SXM with managed support. CoreWeave offers B200 in 8-GPU cluster configurations at $8.60/GPU ($68.80 total), with strong availability in US and Europe regions.

Vast.AI spot pricing ranges widely based on supply. Off-peak instances drop to $4.20/hour but carry no uptime guarantees suitable only for fault-tolerant batch work.

Provider Availability

RunPod

Widest B200 inventory as of March 2026. Availability in 8 global regions: US-East, US-West, Europe-West, Asia-Southeast. Support for multi-GPU configurations up to 8x B200 per instance.

Provisioning: Immediate for on-demand. Spot instances deploy within 5-15 minutes during non-peak hours.

Web UI supports real-time utilization monitoring and automatic failover for multi-GPU setups.

Lambda Labs

Limited B200 capacity: 12 total instances across two US regions. Priority access for existing customers with monthly spend exceeding $500.

Provisioning: 2-4 hour lead time due to manual allocation. Better for development than production scaling.

Direct support team available for custom configurations and volume negotiations.

CoreWeave

Strong B200 presence in Frankfurt and Amsterdam data centers. Emerging US availability in Virginia.

Provisioning: Immediate on-demand. Predictable capacity for 24+ hour deployments.

Integration with Kubernetes enables programmatic instance management and auto-scaling.

Vast.AI

Community marketplace with intermittent B200 supply. Prices fluctuate hourly based on provider availability. Best rates occur overnight US time when supply exceeds demand.

Provisioning: Immediate upon booking. No commitment required, but instance can be reclaimed by provider with 10 minutes notice.

Suitable only for non-critical batch inference or model development.

Performance Advantages

B200 vs H100 Comparison

B200 memory advantage proves significant for large model inference. Training a 405B parameter model requires minimal architectural changes. H100's 80GB VRAM forces pipeline parallelism, increasing latency. B200's 192GB eliminates pipeline overhead for models up to 175B parameters.

Cost efficiency depends on workload:

  • Single-model inference (7B-70B): H100 more cost-effective at $2.69/hour (RunPod H100 SXM) versus B200's $5.98/hour
  • Multi-model deployment (multiple 70B models): B200 competitive per-model-served
  • 175B+ model inference: B200 mandatory; H100 insufficient VRAM

Token Throughput Characteristics

Real-world measurements on batch inference workloads:

B200 achieves 45-55% higher throughput than H100 for identical models, driven by memory bandwidth advantage. Running Llama 2 70B:

  • H100: 210 tokens/second (batch size 32)
  • B200: 310 tokens/second (batch size 32)

Cost per million tokens: $0.85 on B200 versus $0.70 on H100. H100 remains more cost-efficient per token but B200 excels when memory becomes the limiting factor.

H100 vs H200 vs B200

Detailed Comparison Matrix

MetricH100 SXMH200 SXMB200
Memory80GB HBM3141GB HBM3e192GB HBM3e
Bandwidth3.35 TB/s4.8 TB/s8.0 TB/s
Compute (FP8)~1.8 PF~3.6 PF~9 PF
Price/Hour$2.69$3.59$5.98
Optimal WorkloadSingle 70BSingle 140BMulti 70B+

Use Case Selection Guide

Choose H100 when:

  • Running single models up to 70B parameters
  • Cost per token is primary optimization
  • Inference latency below 500ms required (fewer models = lower latency)

Choose H200 when:

  • Models between 70B-175B parameters
  • Memory bandwidth important but not extreme
  • 15% cost premium acceptable over H100

Choose B200 when:

  • Deploying multiple large models simultaneously
  • Memory capacity exceeds 120GB requirement
  • Cost per request more important than cost per token

See the complete GPU pricing comparison for all available options.

FAQ

Q: What workloads justify B200 over H100? Multi-model inference deployments and models exceeding 100B parameters. Single-model inference favors H100's superior cost-per-token ratio. Benchmark with actual models before committing.

Q: Can I rent B200 on a pay-as-you-go basis? Yes. RunPod, Lambda, and CoreWeave all offer hourly on-demand pricing. No minimum commitment required. Spot pricing on vast.AI offers 30-35% discounts with availability caveats.

Q: What's the current market price for B200 on RunPod? $5.98/hour on-demand as of March 2026. Lambda Labs charges $6.08/hour, CoreWeave $8.60/GPU in 8-GPU clusters. Spot pricing on RunPod reaches ~$3.89/hour during off-peak periods.

Q: How long to provision a B200 instance? RunPod: 2-5 minutes. CoreWeave: immediate in Europe, 10-15 minutes in US. Lambda: 2-4 hours. Vast.AI: immediate but with availability risk.

Q: Can I bundle multiple B200s for 200B+ model inference? Yes. Pipeline parallelism across 2x B200 enables 400B parameter models. RunPod supports multi-GPU configurations directly. Expect 15-20% efficiency loss from inter-GPU communication overhead.

Q: Is B200 worth the cost premium over H200? Only if deploying multiple 70B models or 140B+ single models. Cost breakeven occurs around 250+ inference hours monthly. For lighter workloads, H200's lower hourly rate provides better value.

Sources

  • RunPod Official Pricing (March 2026)
  • Lambda Labs Rate Cards (March 2026)
  • CoreWeave Pricing Dashboard (March 2026)
  • NVIDIA Blackwell GPU Specifications
  • Third-Party GPU Benchmark Aggregation