A100 on Google Cloud: Pricing, Specs & How to Rent

Deploybase · February 3, 2025 · GPU Pricing

Contents

A100 GPU Specifications

The NVIDIA A100 dominates data center GPU computing. Released in 2020, it delivers strong performance for ML, HPC, and analytics. Google Cloud offers multiple configurations.

Core specifications:

  • 40GB or 80GB HBM2e memory options
  • 6,912 CUDA cores (full GPU)
  • Up to 312 TFLOPS peak performance (FP16/BF16)
  • Multi-Instance GPU (MIG) capability dividing into up to 7 partitions
  • PCIe and SXM4 form factors
  • 1,555 GB/s memory bandwidth (40GB) / 2,039 GB/s (80GB SXM)
  • Support for both single and multi-node training

The A100 excels at tensor operations for deep learning. Multi-Instance GPU partitioning splits the GPU into 7 partitions, letting teams run multiple smaller workloads on a single GPU.

Google Cloud A100 Pricing

Google Cloud's pricing model combines on-demand rates with substantial commitment discounts. The platform provides transparency through its pricing calculator and detailed billing documentation.

Google Cloud offers A100 GPUs through the a2-highgpu instance family (e.g., a2-highgpu-1g for 1xA100 40GB, a2-highgpu-8g for 8xA100 40GB). The a2-megagpu-16g instance supports 16xA100 40GB for large-scale training. A100 80GB is available via the a2-ultragpu family.

Standard on-demand pricing:

  • A100 40GB (a2-highgpu): ~$3.67 per hour
  • A100 80GB (a2-ultragpu): ~$5.07 per hour
  • 1-year commitment discounts: 30% off standard pricing
  • 3-year commitment discounts: 50% off standard pricing

Preemptible instances (interruptible) cost 70-75% less but risk termination. Regional pricing variations reflect data center costs and local demand. As of March 2026, Google Cloud's A100 pricing remains competitive for production workloads prioritizing stability and support.

Comparison with AWS GPU pricing shows Google Cloud often provides better rates for sustained, predictable workloads through commitment discounts.

How to Rent A100 on Google Cloud

Provisioning an A100 instance takes these steps:

  1. Go to Compute Engine > VM instances
  2. Create new instance
  3. Configure machine type (select GPU-accelerated template)
  4. Choose A100 GPU count (1, 2, 4, or 8)
  5. Select memory configuration (40GB or 80GB)
  6. Choose region and zone carefully (affects pricing and latency)
  7. Select boot disk image (Ubuntu, CentOS, or Google's optimized images)
  8. Configure networking and storage
  9. Review and deploy

Google Cloud provides integrated GPU support with CUDA pre-installed on official images. Network bandwidth to accelerators is optimized, with NVLink support on A100 80GB configs enabling faster multi-GPU communication.

Users can also use Google Cloud's AI Platform for managed training, abstracting infrastructure complexity. This service automatically provisions GPUs, handles distributed training, and manages resource cleanup.

Comparing A100 Pricing Across Clouds

A100 availability spans multiple cloud providers, each with distinct pricing strategies.

Google Cloud (on-demand): $3.67/hour (40GB) to $5.07/hour (80GB)

  • Strongest commitment discounts
  • Integrated with Google's ML services
  • Excellent regional availability

AWS GPU pricing for A100:

  • Approximately $2.75/GPU-hour on-demand (p4de.24xlarge = 8xA100 80GB at ~$22/hr total)
  • Similar commitment discount structures
  • Broader instance type flexibility

Azure GPU pricing for A100:

  • Approximately $3.67/hour (Standard_NC24ads_A100_v4, single A100 80GB)
  • Strong production support
  • Smooth integration with Microsoft tools

Lambda GPU pricing for A100:

  • Fixed rates around $1.48/hour
  • No hidden charges
  • Dedicated GPU cloud provider

Vast.AI secondary market offers variable A100 pricing potentially undercutting all above options, though with availability variability. Teams prioritizing reliability choose managed clouds, while cost-conscious teams explore spot markets.

A100 Use Cases & Performance

The A100 addresses diverse workload categories with strong performance characteristics.

Training performance metrics:

  • ResNet-50: 24,000 images/second (mixed precision)
  • BERT: 3,500 sequences/second
  • GPT-3 equivalent: supports 1.75B parameter training
  • Multi-GPU scaling: near-linear throughput to 8 GPUs

Inference capabilities:

  • TensorRT optimization yields 10-50x speedup
  • Batch inference at sub-20ms latency
  • Supports INT8 quantization without accuracy loss
  • Real-time serving at 1,000s requests per second per GPU

Data analytics workloads benefit from A100's memory bandwidth. RAPIDS libraries integrate with GPU compute for end-to-end data processing, often 20-50x faster than CPU equivalents.

Teams running production inference deploy A100s for consistent performance. Research teams and startups consider Lambda GPU pricing or RunPod pricing for cost efficiency.

FAQ

What's the best region for A100 on Google Cloud? us-central1 typically offers the best pricing. Consult the pricing calculator for specific region rates, as they fluctuate.

Can I use Google Cloud's A100 for training and inference? Yes, A100s handle both workloads. Tensor Cores provide excellent throughput for training, while low latency makes inference efficient.

Does Multi-Instance GPU partition help costs? Yes, MIG divides A100 40GB into 7 smaller GPUs, maximizing utilization when running multiple small models simultaneously.

How do commitment discounts work on Google Cloud? Purchase 1-year or 3-year commitments upfront for 30-50% discounts on standard rates. Unused commitments cannot be refunded.

What's the difference between preemptible and standard A100 instances? Preemptible instances cost 70-75% less but risk termination (usually 24 hours notice). Use for fault-tolerant batch jobs only.

Sources