NVIDIA A100 Price: Cloud GPU Rental Rates 2026

Deploybase · February 25, 2025 · GPU Pricing

Contents


NVIDIA A100 Price Overview

NVIDIA A100 cloud rental pricing ranges from $1.19 to $1.48 per GPU-hour as of March 2026, depending on provider and form factor. That's the tightest range among data center GPUs tracked on DeployBase's GPU pricing dashboard.

Why so consistent? The A100 entered the market in 2020. Supply has normalized. Competition among cloud providers has compressed margins. A team can rent an A100 on RunPod, Lambda, or mid-market providers at predictable rates. No wild outliers like hyperscaler pricing.

Buying outright runs $15,000 to $40,000 depending on variant. Breakeven for rental vs purchase lands around 12,000 to 15,000 hours of continuous use. At 24/7 operation, that's 14 to 20 months. Add datacenter overhead and practical breakeven reaches 18-24 months.


Cloud Provider Pricing

All prices below are as of March 21, 2026. Single-GPU on-demand rates.

ProviderGPU ModelForm FactorVRAM$/GPU-hrSource
RunPodA100PCIe80GB$1.19runpod.io
RunPodA100SXM80GB$1.39runpod.io
LambdaA100PCIe40GB$1.48lambda.AI
LambdaA100SXM40GB$1.48lambda.AI

The range is narrow: $1.19 to $1.48. Only $0.29 spread across the best-priced providers. This reflects market maturity. A100s are no longer latest. Margins have stabilized. Everyone sources from the same NVIDIA inventory channels, so pricing pressures are uniform.

RunPod's 80GB variants are cheapest because the provider accepts lower margins for volume. Lambda charges the same ($1.48) for both PCIe and SXM 40GB variants, which is notable. usually SXM commands a premium due to higher bandwidth and NVLink capability. Lambda's pricing suggests they're optimizing for utilization over form factor differentiation.

All listed rates are on-demand. No long-term commitment required. Spot pricing typically runs 40-60% below these rates, though availability varies week-to-week.


Understanding A100 Use Cases by Industry

Machine Learning and AI Development

ML teams use A100s for model training, fine-tuning, and experimentation. The 80GB memory accommodates models up to 70B parameters with batch sizes of 8-16. For teams training proprietary models, A100 cloud rental is cheaper than buying and maintaining on-prem infrastructure.

Example: A startup training recommendation models for e-commerce. Each model is 10-20B parameters. Fine-tuning on 100K examples takes 12-15 hours. RunPod A100 at $1.19/hr costs $14-18 per fine-tune job. Over a year of weekly fine-tuning, the total spend is ~$900/year. Buying an A100 ($27K) and running 24/7 makes no sense at this scale.

Data Science and Analytics

Data teams use A100s for large-scale data processing (feature engineering, data transformation). Frameworks like RAPIDS use GPU acceleration for Pandas-like operations.

Example: Processing 100M customer records through embedding models, each record is 1KB of text. Total data: 100GB. A100 throughput: ~280 MB/s = ~360 seconds = 6 minutes to process all data. Cost: 0.1 hours × $1.19 = $0.12.

Compare to CPU (standard Xeon): ~40 MB/s throughput = 2,500 seconds = 42 minutes. Time savings justify the GPU cost.

Academic Research

Universities rent A100s by the hour for research experiments. The on-demand flexibility is ideal for iterative research where teams don't know upfront if an experiment will take 2 hours or 20 hours.

Example: Testing 10 different hyperparameter configurations for a vision transformer. Each run takes 4 hours. Total: 40 hours. Cost: 40 hours × $1.19 = $47.60. The ability to pay only for the hours used (not buy for one experiment) is the entire value proposition here.


Form Factors: PCIe vs SXM

A100 PCIe (40GB and 80GB)

Standard PCIe form factor. Fits any server with a spare slot. Memory: either 40GB or 80GB HBM2e. Bandwidth: 1,555 GB/s (40GB variant) or 2,000 GB/s (80GB variant).

PCIe is the baseline. Lowest power draw (300W for 80GB). Simplest integration. Boutique providers like RunPod standardize on PCIe because it requires minimal infrastructure: plug into a server motherboard, route power, done.

Best for: inference, single-GPU batch jobs, development work. Anywhere multi-GPU NVLink isn't needed.

A100 SXM (40GB and 80GB)

SXM (Server Extreme Module) form factor. Requires NVIDIA DGX or HGX baseboard. 80GB HBM2e, 2,000 GB/s bandwidth, with NVLink support for up to 600 GB/s peer-to-peer GPU-to-GPU connections.

Higher power ceiling (400W for 80GB). More complex chassis requirements. Thermal design assumes rack mounting with dedicated cooling.

Best for: distributed training where NVLink bandwidth matters. Multi-GPU parallel training. Scenarios where per-GPU aggregate bandwidth to system memory plus GPU-to-GPU interconnect are critical.


Single vs Multi-GPU Rates

Single-GPU (1x)

RunPod A100 PCIe 80GB: $1.19/hr RunPod A100 SXM 80GB: $1.39/hr

Multi-GPU (2x, 4x, 8x)

Lambda multi-GPU pricing (from API data):

  • 2x A100 PCIe 80GB: $2.96/hr ($1.48 per GPU)
  • 4x A100 PCIe 160GB: $5.92/hr ($1.48 per GPU)
  • 8x A100 SXM 320GB: $11.84/hr ($1.48 per GPU)
  • 8x A100 SXM 640GB: $16.48/hr ($2.06 per GPU)

RunPod multi-GPU pricing:

  • 2x A100 SXM 160GB: $2.78/hr ($1.39 per GPU)
  • 4x A100 SXM 320GB: $5.56/hr ($1.39 per GPU)
  • 8x A100 SXM 640GB: $11.12/hr ($1.39 per GPU)

Multi-GPU pricing per-GPU is linear (or near-linear) across these tiers because the providers are passing through the cost of orchestration, NVLink interconnect, and chassis management directly. An 8-GPU cluster doesn't get a bulk discount; it costs 8x the per-GPU rate (plus minimal overhead).


Instance Types

On-Demand

No commitment. Maximum flexibility. At $1.19/hr (RunPod PCIe):

Monthly (730 hours): $870 Annual: $10,440

For inference or batch jobs with unpredictable timing, on-demand is the default.

Spot and Preemptible

Spot pricing typically runs 40-60% discount off on-demand. RunPod spot A100 rates: reported as low as $0.50-$0.60/hr historically. GCP preemptible A3-A100 (different architecture, similar tier) runs 60-70% discount.

The tradeoff: 2-minute eviction notice on AWS spot. Workloads need checkpoint support. Research, data preprocessing, training with gradient accumulation (natural checkpoints) fit well. Production inference serving: risky unless there's failover.

Reserved and Committed

Committing to 1-3 year terms typically yields 25-40% discounts off on-demand.

CoreWeave committed use discounts (CUDs) typically run 30-40% depending on term. Google Cloud CUDs: up to 45% off on-demand.

The calculation: 1-year reserved at 30% discount = $0.83/hr on RunPod's base $1.19 on-demand. Annual cost: $7,260. Saves $3,180/year vs on-demand. But locked in through March 2027. If A100 availability drops and new-gen GPUs enter the market, that commitment becomes a liability.


Purchase Costs

Individual GPUs (Street Price)

VariantPriceVolume (10+)
A100 PCIe 40GB$20,000-$25,000$18,000-$22,000
A100 PCIe 80GB$25,000-$30,000$22,000-$27,000
A100 SXM 80GB$35,000-$40,000$30,000-$35,000

These are typical OEM/reseller prices as of Q1 2026. Exact pricing varies by geography, distributor, and quantity. large-scale customers get deeper discounts.

Complete Systems

DGX A100 (8x SXM GPUs, NVIDIA-designed chassis, NVLink, networking): $200,000-$250,000 list price. Discounts of 10-20% typical at volume.

Custom 8-GPU A100 SXM server: $180,000-$240,000 before infrastructure.

Power consumption at full load: 8x 400W = 3.2 kW. Electricity costs: roughly $2,800/year at $0.12/kWh running 24/7. Add $5,000-$40,000 for power and cooling infrastructure depending on existing facility readiness.


Buy vs Rent Analysis

The Math

At $1.19/hr (RunPod A100 PCIe):

Monthly: $870 Annual: $10,440 3-year projection: $31,320

An A100 PCIe purchased at $27,500 (midpoint street price) plus 3 years of power (350W continuous, $0.12/kWh): roughly $1,750/year = $5,250 total. Cooling and maintenance: $2,000-$8,000 amortized. Total cost of ownership over 3 years: roughly $35,000-$40,000.

Breakeven: 12,000-15,000 hours of continuous use. At 24/7: 14-20 months. At 8 hours/day: 40-60 months.

When to Rent

  • Project duration under 18 months.
  • Utilization below 50% (part-time workloads).
  • No existing datacenter infrastructure.
  • Capital-constrained.

When to Buy

  • Continuous utilization above 60% over 2+ years.
  • Existing power and cooling infrastructure.
  • Multi-year training or inference commitment.

The inflection point: if a team plans to keep a GPU running more than 18 months, purchase becomes economical. If shorter, rent.


Cost Optimization

Provider selection is the biggest lever. RunPod at $1.19/hr is 4% cheaper than Lambda at $1.48/hr. Small gap, but on a 24/7 deployment for a year, that's $2,500 saved. Bigger savings come from provider discounts, volume commitments, or switching form factors (PCIe vs SXM).

Spot vs on-demand. If a workload can tolerate 2-minute interruptions and automatically checkpoint, spot pricing at $0.50-$0.70/hr cuts costs 55-60%. Research teams running fault-tolerant training can cut infrastructure costs by half.

Right-sizing the model. Not every inference job needs an A100. A 7B parameter model can run on an A10 or L4 at $0.44-$0.86/hr. A100 is overkill for serving lightweight models. Push inference down to cheaper GPUs wherever possible.

Committed instances. Locking in 1-year commitments saves 25-40%. The risk: H100 and H200 are cheaper-per-token than A100 now. A1-year A100 commit signed in March 2026 runs through March 2027, by which time the cost-per-inference-token may have dropped. 1-year terms are safer than 3-year.

Batch optimization. A100's strength is handling large batch sizes. 256-token prefill + large batch decode is where it shines. Batching workloads together reduces the per-token cost. Inference engines like vLLM and TensorRT-LLM extract maximum throughput from A100s.


Pricing Outlook

A100 is 6 years old (released 2020). H100 is 3 years old. H200 and B200 are entering the market. New-generation pressure is pushing A100 pricing lower.

Expect A100 on-demand rates to stay flat or drift 5-10% lower through Q2-Q3 2026 as supply chains stabilize for next-gen GPUs. Spot rates will remain volatile but trend lower.

The real pressure comes from application-specific benchmarks. A100 is strong for general training. For inference, newer GPUs (L40S, H200 with larger memory) offer better cost-per-token. For LLM serving, a team should compare A100 vs alternatives on throughput and latency per dollar before committing.


Multi-GPU and Cluster Pricing Models

Scaling Costs for Training Clusters

Training large models requires multiple GPUs. A typical 7B model training cluster uses 8 A100 GPUs. Scaling costs are not linear because multi-GPU instances include orchestration and networking:

8x A100 SXM Cluster Pricing (as of March 2026):

ProviderPer-GPU Rate8-GPU Monthly (730 hrs)Notes
RunPod$1.39/hr$8,122Per-GPU rate applies (linear scaling)
Lambda$1.48/hr$8,646Per-GPU rate applies (linear scaling)

The per-GPU cost doesn't increase when clustering (unlike some providers that bundle CPU, RAM, networking into multi-GPU rates). This is favorable for teams.

Large Cluster Economics (64+ GPUs)

At 64-GPU scale, teams hit diminishing returns on cloud economics. The monthly rental cost ($65K-$70K) exceeds many companies' infrastructure budgets for the runway. At that scale, buying on-prem becomes economic:

  • Purchase: 64x A100 SXM @ ~$35K each = $2.24M + $200K for chassis/cooling = $2.44M
  • Power: 64x 400W = 25.6 kW = ~$30K/year
  • Maintenance: ~$50K/year
  • 3-year TCO: ~$2.74M

Cloud: 64x $1.39/hr x 730 hrs/month x 36 months = ~$2.33M

Cluster size is the inflection point. Under 16 GPUs: rent. Over 32 GPUs: buy makes economic sense. 16-32 is the zone where both options are viable, and the decision depends on timeline and capital constraints.


Deployment Scenarios and Cost Estimates

Research and Development

Academic teams and small labs run inference on the cheap. RunPod A100 at $1.19/hr with spot pricing at $0.50/hr is the goto. 20 hours of experimentation/month costs $10-20.

No infrastructure overhead. No capital expenditure. Fire up, run experiment, shut down. Spot's 2-minute eviction window is acceptable for research (checkpoint frequently).

Production Inference

A startup serving Claude-sized models (70B) on A100 clusters:

  • 8x A100 SXM at RunPod: $1.39/hr = $10,152/month
  • Covers ~1B tokens/month at batch size 32
  • Revenue per million tokens: $1-5 (API pricing varies by model)
  • 1B tokens = $1,000-$5,000 revenue; infrastructure cost is $10K
  • Margin is negative at this scale. Only viable if model quality justifies 10x revenue premium

Better approach: Use H100 or H200. Faster inference means higher throughput per month. Or optimize inference (vLLM batching, quantization) to improve cost-per-token.

Batch Processing and ETL

Processing large document collections with embeddings:

  • 100M documents × 512 tokens each = 51B tokens
  • RunPod A100: 280 tok/s throughput = ~51,000 GPU-seconds = ~14 hours of GPU time
  • Cost: 14 hours × $1.19/hr = $16.66

Batch processing is A100's ideal use case. High throughput, no latency requirement, checkpoints are cheap, spot interruptions are acceptable.


Regional Pricing and Provider Availability

Geographic Considerations

A100 availability varies by region. US-based providers (RunPod, Lambda) have consistent pricing. EU and APAC providers may have different rates due to local power costs and demand.

RunPod and Lambda are US-based. Their pricing is consistent across regions they serve. CoreWeave (mentioned for H100, not A100) is expanding globally.

Provider Lock-in

Renting from RunPod for 12 months means no flexibility to migrate if a cheaper provider emerges. Migration costs are real: model transfers, API endpoint changes, potential downtime.

Spot instances mitigate this: teams are not locked in by long-term contracts. If RunPod becomes more expensive, switch to Lambda. Cost to switch: minimal (just re-upload model).


Alternative Approaches to A100 Rental

Shared GPU Services

Some providers offer "shared GPU" tiers where a single A100 is split among multiple users. Fractional GPU access costs $0.15-$0.40/hr (cheaper than full rental) but with contention: if another user spikes, the performance tanks.

Suitable for development and testing, not production.

Spot and Preemptible Pricing

AWS spot P5 (H100 equivalent) reaches $2.10-$2.50/hr, a 50-70% discount off on-demand. A100 spot availability is less consistent, but when available, prices are similarly discounted.

The catch: 2-minute interruption. Applicable to training with checkpoints, data preprocessing, research. Not suitable for customer-facing inference.

Reserved Instances

Locking in for 1 year at 30% discount: A100 PCIe at $0.83/hr. Annual commitment: $7,260 for continuous 1-year use.

If workload is steady and teams are confident in the model/provider, 1-year terms are the best value. 3-year terms are risky: hardware evolution is rapid.


FAQ

What is the cheapest A100 rental?

RunPod A100 PCIe 80GB at $1.19/GPU-hour as of March 2026, the lowest tracked on DeployBase.

Should teams buy or rent A100?

Rent if utilization is under 50% or the project is under 18 months. Buy if continuous utilization exceeds 60% with a 2+ year horizon.

What is the difference between A100 PCIe and SXM?

PCIe: standard form factor, 1,555-2,000 GB/s bandwidth (40GB/80GB), 250-300W TDP, $1.19-$1.48/hr. SXM: requires special chassis, NVLink support, 400W TDP, 2,000 GB/s bandwidth, $1.39-$1.48/hr. Use PCIe for single-GPU work. Use SXM for distributed multi-GPU training needing NVLink interconnect.

Is A100 still worth renting in 2026?

For inference on well-tuned batched workloads, yes. Throughput is predictable and cost-per-token is reasonable. For training, consider H100 or H200 if the model is large. A100 is solid but not latest.

How much does fine-tuning cost on A100?

LoRA fine-tuning a 7B parameter model takes 12-20 hours on a single A100. At $1.19/hr, that's $14-$24. Full fine-tuning on an 8-GPU cluster: $95-$150 per run. Parameter-efficient methods cut costs significantly.

What can I do to reduce A100 rental costs?

Use spot pricing for fault-tolerant workloads (40-60% savings). Right-size: use cheaper GPUs for lightweight models. Batch inference requests to maximize throughput. Lock 1-year commitments if workload is stable (25-30% savings). Compare providers: RunPod is consistently cheaper than hyperscalers.



Sources