Contents
- NVIDIA B200 Price: Overview
- B200 Pricing by Provider
- B200 vs H200 vs H100 Price
- Monthly Cost Projections
- B200 vs H100 Total Cost of Ownership (1 Year)
- Multi-GPU Cluster Costs
- Cost-Per-Task Analysis
- When B200 Makes Sense
- B200 Performance Gains vs H100
- Rental vs. Purchase Decision
- B200 vs H100/H200 Throughput Benchmarks
- Special Considerations for B200 Workloads
- Multi-Region B200 Deployments
- B200 Energy Efficiency
- B200 Availability by Workload Type
- FAQ
- Related Resources
- Sources
NVIDIA B200 Price: Overview
NVIDIA B200 cloud rental starts at $5.98 per GPU-hour on RunPod and $6.08/hour on Lambda (single-GPU). The 8-GPU CoreWeave cluster costs $68.80/hour total, or $8.60 per GPU. B200 launched in early 2026 with 192GB HBM3e memory and approximately 4x the tensor throughput of H100 on FP8 workloads. It's the newest and fastest Blackwell-generation GPU in cloud rental as of March 2026.
B200 targets extreme-scale training and long-context inference workloads. The massive VRAM (192GB HBM3e) and high tensor throughput make it the choice for teams training 200B+ parameter models or serving ultra-long context windows. For most teams, B200 is not yet cost-optimal; H100 and H200 remain better value per dollar.
Compare B200 rates alongside other NVIDIA GPU prices on DeployBase. The premium reflects newest silicon (Blackwell architecture) and limited supply in cloud markets.
B200 Pricing by Provider
| Provider | Model | VRAM | Form Factor | $/GPU-hr | $/Month (730 hrs) |
|---|---|---|---|---|---|
| RunPod | NVIDIA B200 | 192GB | SXM | $5.98 | $4,365 |
| Lambda | NVIDIA B200 SXM | 192GB | SXM | $6.08 | $4,438 |
| CoreWeave | NVIDIA B200 (1x) | 192GB | SXM | - | - |
| CoreWeave | NVIDIA B200 (8x cluster) | 1,536GB | SXM | $68.80 | $50,224 |
Data from official provider pricing (March 21, 2026). Single-GPU B200 on CoreWeave is not publicly listed; cluster pricing begins at 8-GPU minimum.
B200 vs H200 vs H100 Price
| Metric | H100 | H200 | B200 |
|---|---|---|---|
| RunPod $/hr | $1.99 | $3.59 | $5.98 |
| Memory | 80GB | 141GB | 192GB |
| Bandwidth | 3.35 TB/s | 4.8 TB/s | 8.0 TB/s |
| Peak TFLOPS (FP8) | 3,958 | 3,958 | 9,000+ |
| Price per GB VRAM | $0.025/hr | $0.025/hr | $0.033/hr |
B200 costs 3x more than H100 per hour but offers 2.25x the VRAM. The memory-to-cost ratio is roughly equivalent to H200. The performance gap is substantial (B200 is 3-4x faster on FP8 operations), but supply constraints have kept prices high through Q1 2026.
Monthly Cost Projections
Light Usage (Research, Testing)
Scenario: 60 hours per month on RunPod B200
- Cost: 60 × $5.98 = $358.80/month
- Annual: $4,306
Realistic for teams prototyping Blackwell-based training or evaluating B200 performance before committing to larger deployments.
Medium Usage (Ongoing Production)
Scenario: 240 hours per month (8 hours per day, 5 days/week)
- Cost: 240 × $5.98 = $1,435.20/month
- Annual: $17,222
Covers continuous fine-tuning workloads, regular evaluation runs, or moderate inference serving on large models.
Heavy Usage (Continuous Operation)
Scenario: 24/7 on CoreWeave 8x B200 cluster (730 hours/month)
- Cost: 730 × $68.80 = $50,224/month
- Annual: $602,688
Justifiable only for teams training 200B+ parameter models continuously or serving extreme-scale inference. Break-even on B200 hardware purchase (~$480,000-$640,000 for 8 GPUs) occurs at roughly 7,000-9,300 GPU-hours (10-13 months of continuous operation for an 8-GPU cluster).
B200 vs H100 Total Cost of Ownership (1 Year)
| Scenario | H100 (Annual) | B200 (Annual) | Difference |
|---|---|---|---|
| Light (60 hrs/mo) | $1,390 | $4,306 | +$2,916 |
| Medium (240 hrs/mo) | $5,560 | $17,222 | +$11,662 |
| Heavy (730 hrs/mo) | $17,452 | $51,286 | +$33,834 |
Over one year of cloud rental, B200 costs 3x as much as H100. Buy B200 if throughput gains exceed the cost premium (meaning completing a 1-year training job in 3-4 months saves real money in cloud spend and releases the cluster for other projects).
Multi-GPU Cluster Costs
8-GPU B200 Cluster
CoreWeave pricing: $68.80/hour total, $8.60 per GPU in the cluster.
Monthly cost: $50,224 (continuous operation)
Use cases:
- Pre-training 200B+ parameter models from scratch
- Large-scale inference on massive models (405B+ parameters)
- Distributed fine-tuning with data parallelism across 8 GPUs
- Research requiring extreme VRAM (1,536GB aggregate)
The 8-GPU cluster provides 1,536GB total HBM3e memory. A 405B model in 8-bit quantization needs roughly 405GB (unquantized) to 180GB (fully quantized). 1,536GB provides 3-4x headroom for optimizer states, gradients, and activation caches during training.
Cost-Per-Task Analysis
Training a 405B Model (from scratch)
Scenario: Pre-training a 405B parameter model on 8-GPU B200 cluster.
Compute estimate:
- Model size: 405B parameters
- Training data: 1 trillion tokens
- Estimated time: 600,000 seconds (170 GPU-hours, amortized)
- Cost: 170 × $8.60 = $1,462
Equivalent on H100 (32x cluster):
- Throughput per GPU: 1,350 samples/sec vs B200's ~4,000+ samples/sec (estimated)
- Time: ~1,800,000 seconds (500 GPU-hours)
- Cost: 500 × $1.99 = $995
H100 cluster is cheaper for pre-training 405B (requires 4x more GPUs but lower hourly rates). B200 wins on speed (finish 3x faster) if time-to-market matters more than cost.
Fine-Tuning a 70B Model
Scenario: LoRA fine-tuning on 300K examples, 512 tokens per example, batch size 64.
B200 (RunPod, $5.98/hr):
- Estimated time: 8-10 hours (B200's bandwidth acceleration)
- Cost: $48-$60
H100 (RunPod, $1.99/hr):
- Estimated time: 20-24 hours
- Cost: $40-$48
B200 is slightly more expensive in absolute dollars but trains 2x faster. The cost-per-task delta is minimal for fine-tuning; H100 remains competitive.
Inference: Serving a 200B Model
Scenario: Continuous inference on a 200B model, processing 100M tokens per month.
B200 single-GPU (RunPod, $5.98/hr):
- Estimated throughput: 1,200-1,500 tokens/sec
- Monthly tokens: 1,400 × 86,400 × 30 = 3.6B tokens (vs 100M needed, oversized)
- Monthly cost: 730 × $5.98 = $4,365
- Cost per million tokens: $43.65
H100 8x cluster (CoreWeave, $49.24/hr):
- Throughput: 8 × 850 = 6,800 tok/s
- Monthly cost: 730 × $49.24 = $35,945
- Cost per million tokens: $359.45
For modest throughput (100M tokens/month), a single B200 overshoots requirements and cost. H100 cluster is more appropriate. B200 shines on massive throughput (5B+ tokens/month).
When B200 Makes Sense
B200 rental is justified when ALL of the following apply:
Model size exceeds 70B parameters. Smaller models don't need B200's memory or throughput. H100/H200 handle 70B comfortably.
Memory bandwidth is a bottleneck. B200's 8.0 TB/s (vs H100's 3.35 TB/s, a 2.4x improvement) accelerates training throughput when batch sizes are large (512+) or models are extremely large (200B+).
Time-to-completion has business value. Training a 200B model in 3 months (B200) instead of 9 months (H100 cluster) enables faster iteration and product releases.
Supply is available. As of March 2026, B200 is in limited supply on most cloud platforms. Availability for multi-month reservations is spotty.
Cost-per-task exceeds cost-per-hour. For one-off 10-hour fine-tuning jobs, H100 is cheaper. For continuous pre-training over months, B200's speed premium can offset hourly costs.
B200 Performance Gains vs H100
Preliminary benchmarks (not official NVIDIA data, as of March 2026):
| Workload | H100 | B200 | Multiple |
|---|---|---|---|
| FP8 Tensor TFLOPS | 3,958 | ~9,000 | ~2.3x |
| FP4 Tensor TFLOPS | N/A | ~18,000+ | N/A |
| TF32 Tensor TFLOPS | 989 | ~4,500 | ~4.5x |
| Memory Bandwidth | 3.35 TB/s | 8.0 TB/s | 2.4x |
| Training Throughput (1T tokens) | 8.5 days | ~2-3 days (est.) | 3-4x |
B200's 4x throughput advantage on quantized operations (FP8) is game-changing for inference. Training workloads benefit from higher bandwidth and tensor performance, but the speedup is less dramatic (2-3x) due to memory-bound bottlenecks.
Rental vs. Purchase Decision
Cloud Rental ($5.98/hr on RunPod):
- Advantages: no upfront cost, flexible scaling, no hardware maintenance
- Disadvantages: cost accumulates over long workloads
Purchase (estimated $60,000-$80,000 per B200 GPU):
- Advantages: 10-15 cents per hour amortized cost after payoff
- Disadvantages: capital expense, operational overhead, obsolescence risk
Breakeven analysis:
- Single B200: pays for itself after ~10,000-13,000 GPU-hours (~14-18 months continuous)
- 8x cluster: pays for itself after ~10,000-13,000 GPU-hours per GPU (roughly 14-18 months of continuous cluster operation)
Buy B200 hardware if utilization >70% for 18+ months. Rent if utilization <70% or timeline <12 months.
B200 vs H100/H200 Throughput Benchmarks
LLM Inference (tokens per second, single GPU):
| Model | H100 PCIe | H200 | B200 (est.) |
|---|---|---|---|
| Llama 2 7B | 1,800 | 2,000 | 5,200 |
| Llama 2 70B | 850 | 950 | 2,400 |
| Grok 314B | N/A | N/A | 320 |
B200 is 3x faster on 70B models, 6x on 7B. The 4x TFLOPS improvement translates most clearly to quantized (FP8) workloads where B200 activates all its tensor cores.
Training Throughput (samples/sec, 8-GPU cluster):
| Task | 8x H100 | 8x H200 | 8x B200 (est.) |
|---|---|---|---|
| Pre-training (batch=128) | 3,600 | 4,200 | 12,000 |
| Fine-tuning LoRA (batch=64) | 5,000 | 6,000 | 18,000 |
B200 provides 3x speedup on training due to bandwidth and tensor performance gains. Wall-clock time to train 1T tokens: 8.5 days on H100 vs 2.5 days on B200.
Special Considerations for B200 Workloads
Quantization Matters. B200's 4x TFLOPS advantage only appears when using FP8 (8-bit floating point). Full-precision (FP32) workloads don't see 4x speedup; they see 2-3x due to memory bandwidth limits.
For inference: quantize models to FP8 or INT8 to fully utilize B200. Use bfloat16 or FP16 at minimum.
For training: keep weights in FP16 or bfloat16; compute in lower precision. Gradients in FP8 after normalization.
MoE Activation on B200. Some models (DeepSeek V3.1) use Mixture of Experts. Only a fraction of parameters activate per token. B200's efficiency shows here: active parameters can fit in smaller VRAM on B200 than on H100.
Example: DeepSeek 671B with MoE activates ~145B parameters per token. On H100 (640GB 8-GPU VRAM), requires 2 GPUs. On B200 (1,536GB 8-GPU VRAM), fits in 1.5 GPUs. Significantly cheaper when MoE-equipped models are deployed.
Context Window Operations. Long-context inference (200K+ tokens) stresses memory bandwidth during KV cache operations. B200's 8.0 TB/s vs H100's 3.35 TB/s (2.4x wider) translates to 2.0-2.2x throughput on long-context tasks.
Team serving a 70B model with 200K context windows: B200 delivers roughly 2x throughput relative to H100, justifying the 3x price premium.
Multi-Region B200 Deployments
Some cloud providers (CoreWeave, Together.AI) offer multi-region B200 access. Useful for resilience and latency reduction.
Cost trade-offs:
- Single region (lowest cost): $5.98/hr (RunPod)
- Multi-region replication: +30-50% overhead (data sync, failover)
- Multi-region load balancing: +50-100% overhead (orchestration)
For workloads tolerating single-region outages, save money and use one region. For SLA requirements (99.9% uptime), multi-region is necessary; budget accordingly.
B200 Energy Efficiency
B200 TDP (thermal design power): ~700-800W per GPU (estimated; official spec not published as of March 2026).
Cost per kilowatt-hour: ~$0.12 (US average).
Energy cost per training hour:
- Single B200: 800W × 1 GPU × $0.12/kWh = $0.10/hr (negligible vs cloud rental cost)
- 8x B200 cluster: 6,400W × $0.12 = $0.77/hr
Total cost of 8x B200 cluster at CoreWeave: $68.80/hr GPU rental + $0.77/hr energy = $69.57/hr effective cost.
For on-premises setups: energy is more significant. 8 B200 GPUs running 24/7 cost ~$645/month in electricity. Factor this into purchase ROI calculations.
B200 Availability by Workload Type
Available now (March 2026):
- Inference (single or batched)
- Fine-tuning on consumer hardware (1-2 GPUs)
- Research and benchmarking
Coming soon (Q2 2026):
- Large-scale pre-training (16+ GPU clusters)
- Production inference at extreme scale (100+ concurrent users)
- Production managed services (fully hosted training)
Not available yet:
- Spot instances (B200 too new for spot pricing)
- Reserved instances with discounts (expect Q3 2026)
- Specialized services (data labeling, annotation with B200 inference)
FAQ
Is B200 worth buying now (March 2026)?
Too early for most teams. B200 is bleeding-edge, supply is constrained, and software support is still maturing. Rent for 3-6 months to evaluate performance before buying. By late 2026, supply should improve and pricing should drop 20-30%.
How much faster is B200 than H100?
4x faster on FP8 quantized operations, 3-4x on training throughput, 2-3x on inference. Depends on workload and precision type.
Should I use B200 or H200 for long-context inference?
H200 (141GB) is sufficient for most long-context work (up to 200K tokens). B200 (192GB) adds capacity and more bandwidth. Unless VRAM is exhausted, H200 offers better value ($3.59 vs $5.98/hr on RunPod).
Can I mix B200 and H100 in a cluster?
Not recommended for training. Different tensor performance would create bottlenecks. For inference, yes, you can run different models on different GPU types.
When will B200 pricing drop?
Historically, NVIDIA GPU prices drop 20-30% within 6-12 months of launch as supply stabilizes and competition increases. Expect B200 to reach $3.98-$4.50/hr by Q4 2026.
What about B200 availability on cloud platforms?
As of March 2026, only RunPod and Lambda publicly list single-GPU B200. CoreWeave lists 8-GPU clusters. Availability improves monthly. Check DeployBase's GPU dashboard for real-time availability across 50+ providers.
Is there a PCIe version of B200?
No. B200 is only available in SXM (datacenter) form factor. PCIe Blackwell GPUs don't exist in the current generation. All cloud B200 instances use SXM form factor.
Related Resources
- NVIDIA GPU Pricing Comparison
- H100 Cloud Pricing
- H200 Cloud Pricing
- H100 vs A100 Comparison
- GPU Rental Calculator
Sources
- RunPod GPU Pricing
- Lambda Labs GPU Pricing
- CoreWeave Pricing Documentation
- NVIDIA B200 Tensor Core GPU Technical Brief
- DeployBase GPU Pricing API (tracked March 21, 2026)