NVIDIA A100 Cloud Pricing: Where to Rent & How Much It Costs

Deploybase · February 13, 2025 · GPU Pricing

Contents


A100 Price: Overview

The A100 price for cloud rental ranges from $1.19 to $1.48 per GPU-hour as of March 2026. This is the production entry point for machine learning teams: training small to mid-size models, high-throughput inference, and fine-tuning. A100 released in August 2020 (almost 5.5 years old) but remains the best value in the $1-2/hr range. Newer H100 is 3x faster per GPU but 67% more expensive. Older V100 is cheaper but slower.

For teams with time-flexible workloads, A100 is economical. For teams prioritizing speed, H100 justifies the cost premium.

Cheapest single-GPU option: RunPod A100 PCIe at $1.19/hr ($869/month for continuous use). Most reliable: Lambda at $1.48/hr with dedicated infrastructure and SLA. Enterprise-scale multi-GPU: CoreWeave 8x A100 clusters at $21.60/hr ($15,768/month).


Current A100 Pricing

ProviderForm FactorVRAM$/hr$/month$/year
RunPodPCIe80GB$1.19$869$10,420
RunPodSXM80GB$1.39$1,014$12,156
LambdaPCIe/SXM40GB$1.48$1,080$12,961
LambdaMulti (2x)80GB$2.96$2,161$25,930
CoreWeave8x Cluster640GB$21.60$15,768$189,216

Data as of March 21, 2026. Monthly = 730 hours. Annual = 8,760 hours.

RunPod offers the lowest per-GPU hourly rate. Lambda charges the same for PCIe and SXM (unusual; reflects their specific hardware setup). CoreWeave pricing is per cluster (not per GPU), making it expensive for single-GPU work but economical for distributed training.


Provider Breakdown

RunPod: Budget-Focused

Single A100 (PCIe, 80GB): $1.19/hr Single A100 (SXM, 80GB): $1.39/hr Duo A100 (2x SXM, 160GB): $2.78/hr ($1.39/GPU)

RunPod is the cheapest single-GPU option. Pricing updates frequently (usually weekly) based on supply and demand. PCIe is cheaper due to commodity availability. SXM is 17% more expensive (proprietary connector, lower inventory).

Pros:

  • Lowest per-hour rate
  • No minimum commitment
  • Instant availability in most regions
  • Simple pricing without hidden fees
  • Spot instances available at 40-50% discount

Cons:

  • Spot pricing can fluctuate unexpectedly
  • SXM pricing spikes during high demand
  • Less stable SLA than Lambda
  • Fewer datacenters than larger providers
  • Limited customer support

Best for: Students, researchers, one-off experiments, cost-conscious teams willing to tolerate interruptions on spot instances.

Realistic pricing: Plan for $1.30-1.50/hr on-demand, $0.70-0.90/hr on spot (with interruption risk).

Lambda: Stability & Reliability

Single A100 (40GB, PCIe/SXM): $1.48/hr Dual A100 (2x, 80GB): $2.96/hr 8x A100 SXM: $16.48/hr

Lambda charges the same rate for both PCIe and SXM single-GPU (unusual). They consolidate hardware across form factors in their datacenter, resulting in unified pricing.

Pros:

  • Stable pricing (no spot volatility)
  • Dedicated infrastructure (low contention)
  • Strong SLA for uptime (99.9% availability)
  • NVLink-connected multi-GPU clusters
  • Reliable customer support
  • Geographic distribution across US and Europe

Cons:

  • 24% more expensive per GPU than RunPod
  • Occasional minimum usage requirements on production deals
  • Less frequent instant availability during peak seasons

Best for: Production training, teams needing reliability, continuous workloads lasting >1 week, compliance-sensitive teams.

Realistic pricing: Expect $1.48/hr on-demand, no spot discount, but exceptional uptime.

CoreWeave: Production Scale

8x A100 SXM Cluster: $21.60/hr = $2.70/GPU-hr 8x A100 with NVLink: 4.8 TB/s aggregate NVLink bandwidth (600 GB/s per GPU × 8)

CoreWeave specializes in large multi-GPU clusters with NVLink backbone. Not suitable for single-GPU workloads (developers must rent entire clusters).

Pros:

  • Full NVLink bandwidth (critical for distributed training)
  • No GPU pool contention (dedicated cluster)
  • Production SLA with dedicated support
  • Instant availability for large clusters
  • Optimized for distributed training
  • Pay-by-the-hour or monthly commitment discounts

Cons:

  • Only available in multi-GPU configurations (minimum 2x typically)
  • $2.70/GPU-hr is 2.3x RunPod single-GPU
  • Minimum usage periods sometimes required
  • Overkill for single-GPU or small batch jobs

Best for: Pre-training large models (70B+), multi-GPU distributed training, teams with continuous workloads (24/7 utilization).


Form Factor Comparison

PCIe (PCI Express)

Standard industrial form factor. A100 connects via PCIe Gen4 (16x lanes, 32 GB/s bidirectional). Compatible with heterogeneous hardware (mixed GPU types in same system, CPU-GPU communication).

Bandwidth: 32 GB/s (bidirectional) for GPU-to-GPU communication in same system.

Memory: 80GB HBM2e (high-bandwidth memory).

Price: $1.19/hr (RunPod), $1.48/hr (Lambda).

Performance impact: PCIe bandwidth (32 GB/s) is the bottleneck for multi-GPU training. Gradient synchronization across PCIe is slow. Fine-tuning and single-GPU inference not affected.

When to choose: Single-GPU workloads (inference, fine-tuning, experiments), or when cost is priority and multi-GPU communication isn't bottleneck.

Proprietary form factor with NVLink interconnect. A100 SXM features 600 GB/s NVLink per GPU (vs 32 GB/s for PCIe). 18.75x faster GPU-to-GPU communication.

Bandwidth: 600 GB/s per GPU via NVLink (for intra-cluster communication).

Memory: 80GB HBM2e (same as PCIe).

Price: $1.39/hr (RunPod), $1.48/hr (Lambda).

Performance impact: NVLink dramatically reduces gradient synchronization time. For 8x A100 training, NVLink is essential.

When to choose: Only if running 2+ A100 GPUs in same cluster. Single A100 SXM is wasteful (NVLink benefit is lost).

Cost-Benefit Analysis

Single A100: always choose PCIe. SXM costs 17% more, no benefit for single GPU.

2x A100: both PCIe and SXM work. PCIe: 32 GB/s communication. SXM: 600 GB/s. For training, SXM reduces time-to-convergence by 10-20%. Worth it if training time has value (>1 week of training).

8x A100: SXM is mandatory. PCIe would bottleneck severely (gradient sync is slow). Choose SXM clusters only (Lambda, CoreWeave).


On-Demand vs Spot

On-Demand (Guaranteed Allocation)

GPU is reserved for teams. No interruption risk. Higher price.

Pricing: Base rate (e.g., $1.19/hr for RunPod A100 PCIe).

Suitable for: Production training, time-sensitive workloads, compliance-sensitive applications where interruption is unacceptable.

Spot (Interruptible, Lower Cost)

GPU is available if unused by higher-priority customers. Can be evicted with minutes notice. 40-50% cheaper.

Pricing: $0.60-$0.85/hr for RunPod A100 (40-30% of on-demand).

Risk: Interruption mid-training. Mitigation: checkpoint frequently. Resume from latest checkpoint.

Suitable for: Fault-tolerant workloads (batch processing, research experiments with checkpointing), training where wall-clock time is flexible.

Economic trade-off: 40% cheaper × 1.5 interruptions/day = break-even. If interruptions are less frequent, spot is pure savings. More than 1-2 interruptions per day makes spot cost-prohibitive.


Monthly & Annual Cost Projections

Single A100 PCIe (RunPod)

Hourly: $1.19 Daily (24h): $28.56 Monthly (730h): $869 Annual (8,760h): $10,420

At continuous 24/7 utilization, annual cost is $10,420. Used 8 hours/day (half utilization): $5,210/year. Used 4 hours/day: $2,605/year.

Single A100 SXM (RunPod)

Hourly: $1.39 Monthly: $1,014 Annual: $12,156

17% premium over PCIe due to NVLink connector (wasted on single GPU).

Lambda A100

Hourly: $1.48 Monthly: $1,080 Annual: $12,961

Premium reflects reliability and dedicated infrastructure. No spot discount, stable pricing.

8x A100 SXM (Lambda)

Hourly: $16.48 (for all 8) Per-GPU equivalent: $2.06/hr Monthly: $12,030 (all 8) Annual: $144,365 (all 8)

Scales linearly. 8x GPUs cost 8x hourly rate.

8x A100 Cluster (CoreWeave)

Hourly: $21.60 (for all 8) Per-GPU equivalent: $2.70/hr Monthly: $15,768 (all 8) Annual: $189,216 (all 8)

Higher per-GPU rate reflects NVLink integration and dedicated production support. Only option if developers need guaranteed NVLink.


Buy vs Rent Economics

Break-Even Analysis

A100 GPU costs (used market): $6,000-$9,000 A100 GPU cost (new): $10,000-$12,000 RunPod A100 PCIe rental: $1.19/hr

Hours to break even (used hardware): $7,500 / $1.19 = 6,302 hours = 262 days at 24/7 = ~9 months.

Hours to break even (new hardware): $11,000 / $1.19 = 9,244 hours = 386 days at 24/7 = ~13 months.

Rent If:

  • Utilization < 50%: Renting a GPU teams use 10 hrs/month wastes money on cloud. Self-hosted is cheaper.
  • Project duration < 6 months: No time to amortize hardware capital cost.
  • No on-premises infrastructure: No power, cooling, or networking setup available.
  • Experimentation: One-off models, research, proof-of-concept. Risk of hardware depreciation is high.
  • Variable demand: Spiky utilization makes on-prem wasteful.

Buy If:

  • Utilization > 70% for 18+ months: Hardware breaks even after ~12 months. After 18 months, purchased hardware is free.
  • Continuous workload (24/7): Production inference fleet, ongoing training pipeline. Cloud costs are continuous; hardware cost is one-time.
  • On-premises power/cooling available: Data center is already running, amortized across other workloads.
  • Team size > 5: Internal hardware governance, capital budget available, maintenance staff on hand.
  • Cost per GPU-hour is predictable: On-premises cost is fixed. Cloud cost fluctuates with demand.

Real-World Example: Training a Model

Scenario: Train a 13B parameter model. Estimated time: 100 hours on single A100.

Cloud (RunPod A100 PCIe): 100 hrs × $1.19/hr = $119

Self-hosted A100 (purchased for $8,000):

  • Amortized cost (5-year depreciation): $8,000 / 60,000 hours = $0.133/hr
  • Electricity (300W, $0.10/kWh): 100 hrs × 0.3 kW × $0.10 = $3
  • Cooling/hosting (amortized): $2
  • Total: $0.133 + $3 + $2 = $5.13 for 100 hrs (but $8,000 upfront)

Conclusion: Cloud is cheaper for one-off project ($119 vs $8,000 upfront). But if training 100 such models/year, self-hosted wins: $8,000 one-time vs $11,900/year cloud.


Cost Per Workload

Fine-Tuning a 7B Model (100K Examples, LoRA)

Time: 18-20 hours on A100 PCIe Cost: 20 hrs × $1.19/hr = $23.80

Repeat fine-tuning 20 times/year: $476/year on cloud.

Equivalent self-hosted cost (5-year depreciation of $8,000): $8,000 / 60,000 hrs = ~$0.13/hr. 20 hrs × $0.13 = $2.60 (amortized hardware cost only, not electricity).

Conclusion: Cloud is cheaper for occasional fine-tuning. But if the team fine-tunes frequently, self-hosted is more economical.

Inference: Process 1M Documents (512 Tokens Each)

Throughput: A100 ~280 tokens/second (single-GPU, batch=32) Total tokens: 1M docs × 512 = 512M tokens Time: 512M / 280 = 1,828,000 seconds = 508 hours Cost: 508 hrs × $1.19/hr = $604

Monthly recurring cost (if daily): $604 × 30 = $18,120/month (too expensive for continuous use).

With H100 ($1.99/hr): Same throughput (850 tok/s) reduces time to 168 hours = $334. Cost increases (H100 is more expensive) but throughput compensates.

With self-hosted A100 + electricity: 508 hrs × $0.13/hr (hardware) + 508 × 0.3 × $0.10 (electricity) = $66 + $15 = $81. But $8,000 upfront.

Training a 13B Model from Scratch (1T Tokens, Batch=128)

Setup: 8x A100 SXM cluster (necessary for multi-GPU training) Throughput: 8 × 450 samples/sec = 3,600 samples/second Time to train 1T tokens: ~278,000 seconds = 77 hours

Lambda 8x A100: 77 hrs × $16.48/hr = $1,269

Remarkably cheap for training a 13B model. Data and researcher time dominate the cost, not cloud GPU.


Provider Selection Guide

Choose RunPod If:

  • Budget is primary concern. Lowest per-hour rate.
  • Spot instances acceptable. Can tolerate interruptions for 40-50% discount.
  • Short-duration workloads. Experiments, research, one-off jobs.
  • Single-GPU tasks. Fine-tuning, inference batching, quick prototypes.

Expected cost: $1.19/hr (on-demand PCIe), $0.70/hr (spot).

Choose Lambda If:

  • Reliability is critical. Production training, compliance-sensitive work.
  • Multi-GPU training. 2-8x A100 clusters with guaranteed NVLink.
  • Strong SLA required. 99.9% uptime is necessary.
  • Willing to pay premium for stability. 24% more expensive than RunPod.

Expected cost: $1.48/hr (on-demand), no spot.

Choose CoreWeave If:

  • Large-scale training. 8+ GPUs, distributed training.
  • Production support required. Dedicated account management.
  • Long-term commitment. 3-month+ monthly discounts available.
  • NVLink is mandatory. No compromise on interconnect bandwidth.

Expected cost: $2.70/GPU-hr for 8x clusters, lower on monthly commitment.


Multi-GPU Pricing

2x A100

Lambda: $2.96/hr total ($1.48/GPU) RunPod: $2.78/hr total ($1.39/GPU for SXM)

Same per-GPU cost as single. NVLink enables 600 GB/s communication. Training 13B models is viable.

4x A100

Lambda: $5.92/hr total ($1.48/GPU) — 4x PCIe configs RunPod: $5.56/hr total ($1.39/GPU for SXM)

Linear pricing continues. For training 30-50B models.

8x A100

Lambda: $16.48/hr total ($2.06/GPU) CoreWeave: $21.60/hr total ($2.70/GPU)

CoreWeave's premium reflects dedicated infrastructure and production support. Lambda offers equivalent performance cheaper but less uptime guarantee.


Hidden Costs

Electricity

A100 PCIe 40GB draws 250W; PCIe 80GB draws 300W. SXM draws 400W. 24/7 operation: 300W × 24 hrs = 7.2 kWh/day = 2,628 kWh/year.

At $0.10/kWh: $219/year (negligible compared to cloud costs).

At $0.20/kWh (expensive region): $438/year (still small).

Self-hosted electricity is cheap. Data center cooling amplifies this (add 30% overhead for HVAC).

Networking & Egress

Data transfer out of cloud (model checkpoints, results) incurs egress charges.

RunPod: $0.03/GB outbound (generous, bundled). Lambda: $0.04/GB outbound. CoreWeave: $0.02/GB outbound.

Training a 13B model generates ~50GB of checkpoints. Egress cost: $2 (RunPod) to $1 (CoreWeave). Negligible.

Inference workloads with large models (egress model weights): $0.04/GB adds up (model weights can be 100GB+). Costs may reach $4-5/month for large models.

Storage

Training generates checkpoints. Save locally (to GPU provider) or egress to external storage.

RunPod: 1GB free, $0.10/GB beyond. Most teams keep checkpoints local. Lambda: 1GB free, $0.15/GB beyond. CoreWeave: 10GB free, $0.05/GB beyond.

For training a single model: 50GB checkpoints = $5-7 storage cost. Negligible.

API Costs

If training pipeline is orchestrated via API (e.g., scheduling training on RunPod): small API fees may apply. Usually negligible unless thousands of API calls.


FAQ

Why is CoreWeave so much more expensive per GPU?

NVLink integration. 8x A100 with NVLink requires careful packaging, low-contention infrastructure, and dedicated support. Manufacturing and operational cost is higher. You're paying for reliability and guaranteed NVLink bandwidth, not just the GPUs.

Should I use Lambda or RunPod?

RunPod for one-off experiments (cost-sensitive). Lambda for production (reliability-sensitive). For most teams, RunPod for development, Lambda for production.

Is A100 still worth renting in 2026?

Yes, if time-to-completion has slack. A100 is 5.5 years old but still 10x faster than mid-range GPUs (like RTX 3090). For inference, fine-tuning, and mid-size training: A100 is sweet-spot price-to-performance.

Can I negotiate A100 pricing?

RunPod/Lambda: no. Pricing is posted and uniform.

CoreWeave: yes. 3-month+ contracts often include 10-20% discount.

Vast.AI (decentralized): prices vary per provider; sometimes lower than RunPod.

What about used A100 GPUs?

Used A100s: $6,000-$9,000 (cheaper than new). Refurbished from reputable sellers is safer (includes warranty). But risk of degraded memory, thermal issues. Suitable only if on-prem infrastructure is solid.

Is A100 NVL different? Should I rent that instead?

A100 NVL: variant with 94GB memory (vs 80GB standard). Throughput ~5% higher due to wider memory bus. Pricing is similar ($1.19-1.48/hr). Few providers offer NVL (Lambda sometimes does). If you need 94GB specifically, NVL is worth requesting. Otherwise, standard A100 is sufficient.

What about Vast.AI for A100 rental?

Vast.AI: decentralized GPU marketplace. Pricing sometimes 10-20% lower than RunPod. But reliability is lower (peer-to-peer, not datacenter). Suitable for fault-tolerant workloads (batch processing, checkpointed training). Not for production or continuous inference.

How do I choose between A100 and H100?

A100: slower ($1.19/hr), cheaper. For teams with time flexibility (training can take 10+ days).

H100: faster ($1.99/hr), more expensive. 3x throughput, pays for itself if time-to-completion has business value (faster iteration, faster market launch).

Typical rule: if model training is on critical path to product launch, H100 ROI is clear. Otherwise, A100 is economical.



Sources