NVIDIA B200 Price: Cloud Rental Rates and Cost Guide

NVIDIA B200 Price: Overview
B200 Pricing by Provider
B200 vs H200 vs H100 Price
Monthly Cost Projections
B200 vs H100 Total Cost of Ownership (1 Year)
Multi-GPU Cluster Costs
Cost-Per-Task Analysis
When B200 Makes Sense
B200 Performance Gains vs H100
Rental vs. Purchase Decision
B200 vs H100/H200 Throughput Benchmarks
Special Considerations for B200 Workloads
Multi-Region B200 Deployments
B200 Energy Efficiency
B200 Availability by Workload Type
FAQ
Related Resources
Sources

NVIDIA B200 Price: Overview

NVIDIA B200 cloud rental starts at $5.98 per GPU-hour on RunPod and $6.08/hour on Lambda (single-GPU). The 8-GPU CoreWeave cluster costs $68.80/hour total, or $8.60 per GPU. B200 launched in early 2026 with 192GB HBM3e memory and approximately 4x the tensor throughput of H100 on FP8 workloads. It's the newest and fastest Blackwell-generation GPU in cloud rental as of March 2026.

B200 targets extreme-scale training and long-context inference workloads. The massive VRAM (192GB HBM3e) and high tensor throughput make it the choice for teams training 200B+ parameter models or serving ultra-long context windows. For most teams, B200 is not yet cost-optimal; H100 and H200 remain better value per dollar.

Compare B200 rates alongside other NVIDIA GPU prices on DeployBase. The premium reflects newest silicon (Blackwell architecture) and limited supply in cloud markets.

B200 Pricing by Provider

Provider	Model	VRAM	Form Factor	$/GPU-hr	$/Month (730 hrs)
RunPod	NVIDIA B200	192GB	SXM	$5.98	$4,365
Lambda	NVIDIA B200 SXM	192GB	SXM	$6.08	$4,438
CoreWeave	NVIDIA B200 (1x)	192GB	SXM	-	-
CoreWeave	NVIDIA B200 (8x cluster)	1,536GB	SXM	$68.80	$50,224

Data from official provider pricing (March 21, 2026). Single-GPU B200 on CoreWeave is not publicly listed; cluster pricing begins at 8-GPU minimum.

B200 vs H200 vs H100 Price

Metric	H100	H200	B200
RunPod $/hr	$1.99	$3.59	$5.98
Memory	80GB	141GB	192GB
Bandwidth	3.35 TB/s	4.8 TB/s	8.0 TB/s
Peak TFLOPS (FP8)	3,958	3,958	9,000+
Price per GB VRAM	$0.025/hr	$0.025/hr	$0.033/hr

B200 costs 3x more than H100 per hour but offers 2.25x the VRAM. The memory-to-cost ratio is roughly equivalent to H200. The performance gap is substantial (B200 is 3-4x faster on FP8 operations), but supply constraints have kept prices high through Q1 2026.

Monthly Cost Projections

Light Usage (Research, Testing)

Scenario: 60 hours per month on RunPod B200

Cost: 60 × $5.98 = $358.80/month
Annual: $4,306

Realistic for teams prototyping Blackwell-based training or evaluating B200 performance before committing to larger deployments.

Medium Usage (Ongoing Production)

Scenario: 240 hours per month (8 hours per day, 5 days/week)

Cost: 240 × $5.98 = $1,435.20/month
Annual: $17,222

Covers continuous fine-tuning workloads, regular evaluation runs, or moderate inference serving on large models.

Heavy Usage (Continuous Operation)

Scenario: 24/7 on CoreWeave 8x B200 cluster (730 hours/month)

Cost: 730 × $68.80 = $50,224/month
Annual: $602,688

Justifiable only for teams training 200B+ parameter models continuously or serving extreme-scale inference. Break-even on B200 hardware purchase (~$480,000-$640,000 for 8 GPUs) occurs at roughly 7,000-9,300 GPU-hours (10-13 months of continuous operation for an 8-GPU cluster).

B200 vs H100 Total Cost of Ownership (1 Year)

Scenario	H100 (Annual)	B200 (Annual)	Difference
Light (60 hrs/mo)	$1,390	$4,306	+$2,916
Medium (240 hrs/mo)	$5,560	$17,222	+$11,662
Heavy (730 hrs/mo)	$17,452	$51,286	+$33,834

Over one year of cloud rental, B200 costs 3x as much as H100. Buy B200 if throughput gains exceed the cost premium (meaning completing a 1-year training job in 3-4 months saves real money in cloud spend and releases the cluster for other projects).

Multi-GPU Cluster Costs

8-GPU B200 Cluster

CoreWeave pricing: $68.80/hour total, $8.60 per GPU in the cluster.

Monthly cost: $50,224 (continuous operation)

Use cases:

Pre-training 200B+ parameter models from scratch
Large-scale inference on massive models (405B+ parameters)
Distributed fine-tuning with data parallelism across 8 GPUs
Research requiring extreme VRAM (1,536GB aggregate)

The 8-GPU cluster provides 1,536GB total HBM3e memory. A 405B model in 8-bit quantization needs roughly 405GB (unquantized) to 180GB (fully quantized). 1,536GB provides 3-4x headroom for optimizer states, gradients, and activation caches during training.

Cost-Per-Task Analysis

Training a 405B Model (from scratch)

Scenario: Pre-training a 405B parameter model on 8-GPU B200 cluster.

Compute estimate:

Model size: 405B parameters
Training data: 1 trillion tokens
Estimated time: 600,000 seconds (170 GPU-hours, amortized)
Cost: 170 × $8.60 = $1,462

Equivalent on H100 (32x cluster):

Throughput per GPU: 1,350 samples/sec vs B200's ~4,000+ samples/sec (estimated)
Time: ~1,800,000 seconds (500 GPU-hours)
Cost: 500 × $1.99 = $995

H100 cluster is cheaper for pre-training 405B (requires 4x more GPUs but lower hourly rates). B200 wins on speed (finish 3x faster) if time-to-market matters more than cost.

Fine-Tuning a 70B Model

Scenario: LoRA fine-tuning on 300K examples, 512 tokens per example, batch size 64.

B200 (RunPod, $5.98/hr):

Estimated time: 8-10 hours (B200's bandwidth acceleration)
Cost: $48-$60

H100 (RunPod, $1.99/hr):

Estimated time: 20-24 hours
Cost: $40-$48

B200 is slightly more expensive in absolute dollars but trains 2x faster. The cost-per-task delta is minimal for fine-tuning; H100 remains competitive.

Inference: Serving a 200B Model

Scenario: Continuous inference on a 200B model, processing 100M tokens per month.

B200 single-GPU (RunPod, $5.98/hr):

Estimated throughput: 1,200-1,500 tokens/sec
Monthly tokens: 1,400 × 86,400 × 30 = 3.6B tokens (vs 100M needed, oversized)
Monthly cost: 730 × $5.98 = $4,365
Cost per million tokens: $43.65

H100 8x cluster (CoreWeave, $49.24/hr):

Throughput: 8 × 850 = 6,800 tok/s
Monthly cost: 730 × $49.24 = $35,945
Cost per million tokens: $359.45

For modest throughput (100M tokens/month), a single B200 overshoots requirements and cost. H100 cluster is more appropriate. B200 shines on massive throughput (5B+ tokens/month).

When B200 Makes Sense

B200 rental is justified when ALL of the following apply:

Model size exceeds 70B parameters. Smaller models don't need B200's memory or throughput. H100/H200 handle 70B comfortably.

Memory bandwidth is a bottleneck. B200's 8.0 TB/s (vs H100's 3.35 TB/s, a 2.4x improvement) accelerates training throughput when batch sizes are large (512+) or models are extremely large (200B+).

Time-to-completion has business value. Training a 200B model in 3 months (B200) instead of 9 months (H100 cluster) enables faster iteration and product releases.

Supply is available. As of March 2026, B200 is in limited supply on most cloud platforms. Availability for multi-month reservations is spotty.

Cost-per-task exceeds cost-per-hour. For one-off 10-hour fine-tuning jobs, H100 is cheaper. For continuous pre-training over months, B200's speed premium can offset hourly costs.

B200 Performance Gains vs H100

Preliminary benchmarks (not official NVIDIA data, as of March 2026):

Workload	H100	B200	Multiple
FP8 Tensor TFLOPS	3,958	~9,000	~2.3x
FP4 Tensor TFLOPS	N/A	~18,000+	N/A
TF32 Tensor TFLOPS	989	~4,500	~4.5x
Memory Bandwidth	3.35 TB/s	8.0 TB/s	2.4x
Training Throughput (1T tokens)	8.5 days	~2-3 days (est.)	3-4x

B200's 4x throughput advantage on quantized operations (FP8) is game-changing for inference. Training workloads benefit from higher bandwidth and tensor performance, but the speedup is less dramatic (2-3x) due to memory-bound bottlenecks.

Rental vs. Purchase Decision

Cloud Rental ($5.98/hr on RunPod):

Advantages: no upfront cost, flexible scaling, no hardware maintenance
Disadvantages: cost accumulates over long workloads

Purchase (estimated $60,000-$80,000 per B200 GPU):

Advantages: 10-15 cents per hour amortized cost after payoff
Disadvantages: capital expense, operational overhead, obsolescence risk

Breakeven analysis:

Single B200: pays for itself after ~10,000-13,000 GPU-hours (~14-18 months continuous)
8x cluster: pays for itself after ~10,000-13,000 GPU-hours per GPU (roughly 14-18 months of continuous cluster operation)

Buy B200 hardware if utilization >70% for 18+ months. Rent if utilization <70% or timeline <12 months.

B200 vs H100/H200 Throughput Benchmarks

LLM Inference (tokens per second, single GPU):

Model	H100 PCIe	H200	B200 (est.)
Llama 2 7B	1,800	2,000	5,200
Llama 2 70B	850	950	2,400
Grok 314B	N/A	N/A	320

B200 is 3x faster on 70B models, 6x on 7B. The 4x TFLOPS improvement translates most clearly to quantized (FP8) workloads where B200 activates all its tensor cores.

Training Throughput (samples/sec, 8-GPU cluster):

Task	8x H100	8x H200	8x B200 (est.)
Pre-training (batch=128)	3,600	4,200	12,000
Fine-tuning LoRA (batch=64)	5,000	6,000	18,000

B200 provides 3x speedup on training due to bandwidth and tensor performance gains. Wall-clock time to train 1T tokens: 8.5 days on H100 vs 2.5 days on B200.

Special Considerations for B200 Workloads

Quantization Matters. B200's 4x TFLOPS advantage only appears when using FP8 (8-bit floating point). Full-precision (FP32) workloads don't see 4x speedup; they see 2-3x due to memory bandwidth limits.

For inference: quantize models to FP8 or INT8 to fully utilize B200. Use bfloat16 or FP16 at minimum.

For training: keep weights in FP16 or bfloat16; compute in lower precision. Gradients in FP8 after normalization.

MoE Activation on B200. Some models (DeepSeek V3.1) use Mixture of Experts. Only a fraction of parameters activate per token. B200's efficiency shows here: active parameters can fit in smaller VRAM on B200 than on H100.

Example: DeepSeek 671B with MoE activates ~145B parameters per token. On H100 (640GB 8-GPU VRAM), requires 2 GPUs. On B200 (1,536GB 8-GPU VRAM), fits in 1.5 GPUs. Significantly cheaper when MoE-equipped models are deployed.

Context Window Operations. Long-context inference (200K+ tokens) stresses memory bandwidth during KV cache operations. B200's 8.0 TB/s vs H100's 3.35 TB/s (2.4x wider) translates to 2.0-2.2x throughput on long-context tasks.

Team serving a 70B model with 200K context windows: B200 delivers roughly 2x throughput relative to H100, justifying the 3x price premium.

Multi-Region B200 Deployments

Some cloud providers (CoreWeave, Together.AI) offer multi-region B200 access. Useful for resilience and latency reduction.

Cost trade-offs:

Single region (lowest cost): $5.98/hr (RunPod)
Multi-region replication: +30-50% overhead (data sync, failover)
Multi-region load balancing: +50-100% overhead (orchestration)

For workloads tolerating single-region outages, save money and use one region. For SLA requirements (99.9% uptime), multi-region is necessary; budget accordingly.

B200 Energy Efficiency

B200 TDP (thermal design power): ~700-800W per GPU (estimated; official spec not published as of March 2026).

Cost per kilowatt-hour: ~$0.12 (US average).

Energy cost per training hour:

Single B200: 800W × 1 GPU × $0.12/kWh = $0.10/hr (negligible vs cloud rental cost)
8x B200 cluster: 6,400W × $0.12 = $0.77/hr

Total cost of 8x B200 cluster at CoreWeave: $68.80/hr GPU rental + $0.77/hr energy = $69.57/hr effective cost.

For on-premises setups: energy is more significant. 8 B200 GPUs running 24/7 cost ~$645/month in electricity. Factor this into purchase ROI calculations.

B200 Availability by Workload Type

Available now (March 2026):

Inference (single or batched)
Fine-tuning on consumer hardware (1-2 GPUs)
Research and benchmarking

Coming soon (Q2 2026):

Large-scale pre-training (16+ GPU clusters)
Production inference at extreme scale (100+ concurrent users)
Production managed services (fully hosted training)

Not available yet:

Spot instances (B200 too new for spot pricing)
Reserved instances with discounts (expect Q3 2026)
Specialized services (data labeling, annotation with B200 inference)

FAQ

Is B200 worth buying now (March 2026)?

Too early for most teams. B200 is bleeding-edge, supply is constrained, and software support is still maturing. Rent for 3-6 months to evaluate performance before buying. By late 2026, supply should improve and pricing should drop 20-30%.

How much faster is B200 than H100?

4x faster on FP8 quantized operations, 3-4x on training throughput, 2-3x on inference. Depends on workload and precision type.

Should I use B200 or H200 for long-context inference?

H200 (141GB) is sufficient for most long-context work (up to 200K tokens). B200 (192GB) adds capacity and more bandwidth. Unless VRAM is exhausted, H200 offers better value ($3.59 vs $5.98/hr on RunPod).

Can I mix B200 and H100 in a cluster?

Not recommended for training. Different tensor performance would create bottlenecks. For inference, yes, you can run different models on different GPU types.

When will B200 pricing drop?

Historically, NVIDIA GPU prices drop 20-30% within 6-12 months of launch as supply stabilizes and competition increases. Expect B200 to reach $3.98-$4.50/hr by Q4 2026.

What about B200 availability on cloud platforms?

As of March 2026, only RunPod and Lambda publicly list single-GPU B200. CoreWeave lists 8-GPU clusters. Availability improves monthly. Check DeployBase's GPU dashboard for real-time availability across 50+ providers.

Is there a PCIe version of B200?

No. B200 is only available in SXM (datacenter) form factor. PCIe Blackwell GPUs don't exist in the current generation. All cloud B200 instances use SXM form factor.

Sources

RunPod GPU Pricing
Lambda Labs GPU Pricing
CoreWeave Pricing Documentation
NVIDIA B200 Tensor Core GPU Technical Brief
DeployBase GPU Pricing API (tracked March 21, 2026)

Contents