B200 on Google Cloud: Pricing, Specs & How to Rent

B200 GPU on Google Cloud: Availability
B200 Technical Specifications
B200 Performance Benchmarks
Google Cloud B200 Pricing
B200 Availability Through Alternative Providers
B200 Rental Cost Analysis
How to Access B200s Through Alternative Providers
When B200 is the Right Choice
Alternatives to B200 on Google Cloud
Integrating B200 Workloads with Google Cloud
FAQ
Related Resources
Sources

B200 GPU on Google Cloud: Availability

Google Cloud offers B200 GPUs as of March 2026. The 8x B200 cluster (a4 mega instance family) is available at $64.44/hr. Supply remains limited — expect some regional constraints.

B200 is also available on RunPod ($5.98/hr), CoreWeave ($68.80/hr 8x cluster), and Lambda ($6.08/hr).

B200 specs: ~9 petaFLOPS (FP8), 192GB HBM3e. Targets trillion-parameter training.

B200 Technical Specifications

Performance (FP8): ~9,000 teraFLOPS (~9 PFLOPS). FP16/BF16: ~4,500 teraFLOPS (4.5 PFLOPS per GPU). TF32: ~2,250 teraFLOPS (2.25 PFLOPS per GPU).

192GB HBM3e, 8.0TB/s (vs H100's 3.35TB/s). Tensor Cores optimized for sparsity and transformers.

Thermal design power reaches 1000W. Cooling infrastructure requirements exceed H100 cooling complexity due to increased power density.

The GPU supports PCIe Gen5 with 128 GB/sec bandwidth and NVIDIA NVLINK fifth-generation with 1.8 TB/sec per direction for efficient multi-GPU communication.

B200 Performance Benchmarks

Large language model training on B200 hardware demonstrates exceptional throughput. A 405-billion parameter model achieves approximately 380,000 tokens/second during training on a single B200 GPU with batch size 32.

Inference on quantized models shows dramatic throughput improvements. Llama 3 405B at int8 quantization delivers 180 tokens/second on B200 with batched requests, compared to 45 tokens/second on H100.

Fine-tuning substantially larger models becomes feasible. A 70-billion parameter model fine-tunes in 1.5 hours on B200 using QLoRA adapters with rank 64.

Multi-GPU scaling on B200 clusters shows near-linear improvements up to 16 GPUs when using NVIDIA NCCL collective communications and NVLINK interconnects.

Google Cloud B200 Pricing

Google Cloud's a4 mega instance family offers 8x B200 (1,536 GB total) at $64.44/hour as of March 2026. This is slightly cheaper than CoreWeave's 8x B200 at $68.80/hour but more expensive than Lambda's single B200 at $6.08/hour.

Google Cloud integrates B200 availability with Vertex AI, BigQuery, and GCS, making it attractive for teams already invested in the Google Cloud ecosystem.

B200 Availability Through Alternative Providers

CoreWeave is the first major cloud provider to offer B200 GPUs. As of March 2026, CoreWeave provides 8xB200 clusters at $68.80/hour, translating to $8.60 per B200 GPU when divided by cluster size.

Lambda Labs announced B200 availability with SXM configuration at $6.08/hour, though supply remains limited to production customers with minimum commitment requirements.

RunPod offers B200 SXM at $5.98/hour as of March 2026, with public on-demand availability.

B200 Rental Cost Analysis

CoreWeave B200 pricing: $68.80/hour for 8-GPU cluster = $8.60/GPU/hour

Daily: $206.40 per GPU
Monthly (730 hours): $6,278 per GPU
Annual: $75,336 per GPU with monthly commitment discounts applied

Lambda Labs B200 SXM: $6.08/hour (when available)

Daily: $145.92
Monthly: $4,438
Annual: $53,256 (with production discount negotiation)

For cost comparison, B200 prices are approximately 1.5-1.8x H100 costs. The performance-per-dollar ratio, however, favors B200 for very large models due to superior memory bandwidth and compute density.

How to Access B200s Through Alternative Providers

CoreWeave Process:

Register at CoreWeave.AI
Complete identity verification for corporate accounts
Request B200 cluster availability in desired region
Configure cluster size, networking, and storage
Deploy Kubernetes workloads or raw VM instances
Access through standard Kubernetes control plane or SSH

Lambda Labs Approach:

Contact Lambda Labs sales team (production inquiries)
Discuss B200 allocation availability
Negotiate contract terms and minimum commitment
Provision instances when availability confirmed
SSH access provided with standard credentials

RunPod B200:

B200 SXM available on-demand at $5.98/hour
Provision instantly at runpod.io/gpu-cloud
Standard RunPod SSH and API access

When B200 is the Right Choice

B200 becomes cost-effective for training models exceeding 100 billion parameters. Smaller models see diminishing cost benefits due to higher hourly rates.

Multi-month training projects justify B200 allocation. Daily project costs ($206 per GPU on CoreWeave) accumulate quickly for short experiments.

Inference serving for very large models benefits significantly. B200's superior throughput reduces per-token inference costs compared to smaller GPUs running in ensemble.

Teams operating in regions with CoreWeave infrastructure gain access without geographic constraints. CoreWeave's distributed data centers provide low-latency access across North America and Europe.

Alternatives to B200 on Google Cloud

Google Cloud also offers H100 8x clusters at ~$88.49/hr and H200 8x clusters at ~$84.81/hr. For teams that don't need B200's memory capacity, A100 GPUs are available at lower cost in Compute Engine.

L4 GPUs on Google Cloud cost 60% less than A100 but suit inference and batch processing better than large-scale training.

Google Cloud TPU v5e provides competitive performance for specific transformer workloads optimized for TPU tensor shapes. TPU pricing demonstrates significant savings for workloads compatible with TPU infrastructure.

For maximum flexibility, RunPod offers the broadest GPU selection with hourly billing and instant provisioning, bridging Google Cloud and external accelerators through efficient data pipelines.

Integrating B200 Workloads with Google Cloud

Teams provisioning B200s on CoreWeave or Lambda can maintain Google Cloud as a data management layer:

Use Cloud Storage for dataset hosting. Standard REST APIs transfer training data to B200 instances at network-limited speeds typically reaching 800 Mbps to 2 Gbps.

BigQuery integration for experiment tracking and result logging. Training jobs stream final metrics to BigQuery for analysis and visualization.

Persistent model storage on Cloud Storage for disaster recovery. Checkpoint models regularly to Cloud Storage, enabling quick restart if instances terminate.

Cloud IAM manages access credentials for external GPU instances. Service accounts with minimal permissions reduce security risk when external infrastructure accesses Google services.

FAQ

Q: Is B200 worth the cost premium over H100?

For training models larger than 100B parameters, B200's superior memory bandwidth and compute deliver measurable time savings justifying cost premium. For smaller models, H100 often provides better cost per training hour.

Q: Is B200 available on Google Cloud?

Yes. Google Cloud offers 8x B200 clusters at $64.44/hr as of March 2026 through the a4 mega instance family. Availability may be limited by region.

Q: Can I mix B200 and H100 in the same training cluster?

NVIDIA collective communications (NCCL) support heterogeneous clusters but require matching precision levels and batch sizes. Mixing generations typically adds complexity without practical benefit.

Q: What's the supply situation for B200 rental capacity?

As of March 2026, B200 supply remains constrained. CoreWeave and Lambda Labs have limited inventory. Multi-month advance notice may be required for large cluster provisioning.

Q: Does B200 support multi-instance GPU (MIG) partitioning?

B200 supports MIG mode, enabling partition into smaller GPU slices. This reduces cost for inference workloads if tenant isolation is acceptable.

Q: What's the break-even point between B200 and H100?

For 70B parameter models, B200 training completes in approximately 40% less time than H100. Monthly rental cost parity occurs at roughly 280 training hours per month.

B200 Specs Guide - Complete technical specifications

H100 Specs Guide - Previous generation for comparison

GPU Pricing Guide - All providers

CoreWeave GPU Pricing - B200 provider

Lambda GPU Pricing - Alternative B200 access

Sources

NVIDIA B200 Tensor GPU Whitepaper
CoreWeave GPU Pricing and Availability Documentation
Lambda Labs GPU Service Offering Documentation
Google Cloud Compute Engine GPU Offerings
NVIDIA CUDA Toolkit and NCCL Documentation

Contents