GB200 on AWS: Pricing, Specs & How to Rent

GB200 GPU Overview
AWS GB200 Pricing
How to Rent GB200 on AWS
Performance Benchmarks
Competing Alternatives
FAQ
Related Resources
Sources

GB200 GPU Overview

GB200 AWS pricing is the focus of this guide. The GB200 is NVIDIA's Grace Blackwell superchip: one Grace CPU (72 ARM Neoverse V2 cores) paired with two B200 GPUs, connected via NVLink-C2C. The GB200 NVL72 rack-scale system contains 36 Grace CPUs and 72 B200 GPUs. Each B200 GPU has 192GB HBM3e memory at ~8.0 TB/s bandwidth. Huge. Made for mega-scale inference and long-context work (1M+ tokens fit).

Specs:

Memory: 192GB HBM3e (per B200 GPU)
Memory Bandwidth: ~8.0 TB/s (per B200 GPU)
CUDA Cores: 26,624
FP8 peak: ~9 PFLOPS (per B200 GPU)
Grace CPU: 72 Neoverse V2 cores, 480GB LPDDR5x
NVLink-C2C: 900 GB/s (Grace-to-Blackwell)
Partitionable (multi-instance GPU mode)

For related GPUs, see B200 specs and H200 specs.

AWS GB200 Pricing

GB200 on AWS runs $6-8/hour on-demand (depends on region). Reserved (1-year) gets 30-40% off. Reserved (3-year) gets 50-60% off. Spot hits 50-70% savings.

Data transfer out: $0.02/GB. Inbound free. Some regions require capacity reservation signup.

Compare with Google Cloud and Azure if cost matters.

How to Rent GB200 on AWS

Log into AWS Console.
Go to EC2 → Launch Instance.
Search "gb200" (usually p5 family).
Pick Ubuntu 22.04 or a deep learning AMI (CUDA/PyTorch pre-installed).
Configure security groups and SSH.
Launch. Takes 3-5 minutes.

AWS deep learning AMIs save setup time. Just SSH in and start training or serving.

Common uses: large-scale inference serving, fine-tuning massive models, long-context document processing, multi-model orchestration.

Also check RunPod and Lambda pricing if developers want alternatives.

Performance Benchmarks

Inference (single GB200):

Llama 2 70B (FP8): 500-700 tokens/sec
Llama 3 405B (FP8): 200-300 tokens/sec

The 192GB memory fits Llama 3 405B (FP8) with room for context windows.

Training: 405B models run at 500-700 tokens/sec with optimization. Mixed precision is 30-40% faster than FP32.

Context: Handles sequences up to 1M tokens. 256K per model without special tuning.

Competing Alternatives

Provider	GPU	$/hr	Memory	Best For
AWS	GB200	$6-8	192GB	Large-scale production
Google Cloud	H100	$3-5	80GB	TPU bundle
Azure	H200	$4-6	141GB	Windows shops
CoreWeave	GB200	Variable	192GB	Direct GPU rental
Lambda	B200 SXM	$6.08	192GB	Community support

AWS wins on maturity and compliance. CoreWeave and Lambda are simpler if developers just want GPUs.

FAQ

Can I run GB200 instances in spot mode? Yes, spot GB200 instances cost 50-70% less but risk interruption.

Does AWS provide fractional GB200 access? GB200 supports multi-instance GPU mode, allowing partition into smaller allocations.

What is the minimum monthly commitment? On-demand instances have no minimum. Reserved instances require 1 or 3-year terms.

How long does GB200 provisioning take? Most instances launch within 3-5 minutes.

What regions offer GB200? Availability varies. US regions (us-east, us-west) typically have capacity.

Sources

NVIDIA GB200 Grace Hopper Tensor GPU Specifications
AWS EC2 Instance Types Documentation (official)
NVIDIA CUDA Toolkit & cuDNN Documentation

Contents