Spot GPU Pricing: Discounts, Reliability Trade-Offs, and Savings Guide

Spot GPU Pricing: Overview
Spot vs On-Demand Pricing
Understanding Spot Pricing Mechanics
Provider Comparison
Reliability and Interruption Rates
Workload Matching
Cost Savings Analysis
Best Practices
FAQ
Related Resources
Sources

Spot GPU Pricing: Overview

Spot GPU Pricing is the focus of this guide. Spot: 30-70% discount. Catch: 2-30 second eviction notice. Fault-tolerant workloads win. Continuous availability? Avoid spot.

Math: RunPod H100 PCIe on-demand $1.99/hr → spot $0.72/hr (64%). RunPod H100 SXM on-demand $2.69/hr → spot $0.81/hr (70%). AWS H100 $6.88 → $3.44 (50%). A100s stable (2-5% interrupt rate). H100s volatile (8-12%).

Spot vs On-Demand Pricing

Hourly Rate Comparison (as of March 2026)

GPU	Provider	On-Demand	Spot	Discount	Notes
A100 PCIe	RunPod	$1.19	$0.42	65%	Stable spot market
A100 SXM	RunPod	$1.39	$0.59	58%	Popular, lower discount
H100 PCIe	RunPod	$1.99	$0.72	64%	Entry-level H100, good value
H100 SXM	RunPod	$2.69	$0.81	70%	High demand, premiums
H200	RunPod	$3.59	$1.65	54%	Newer, less stable
L40S	RunPod	$0.79	$0.34	57%	Inference GPU, stable
RTX 4090	RunPod	$0.34	$0.22	35%	Consumer card, volatile
H100 PCIe	AWS	$6.88	$3.44	50%	Premium pricing
A100 PCIe	AWS	$3.06	$1.53	50%	Standard AWS markup
GH200	Lambda	$1.99	$0.72	64%	Limited availability, no formal spot program

RunPod has deeper spot discounts than AWS. For the same H100 PCIe, RunPod spot ($0.72) is ~4.9x cheaper than AWS spot ($3.50). This is the hidden value in boutique cloud providers.

Understanding Spot Pricing Mechanics

Why Spot Is Discounted

Providers overprovision for spikes. Excess capacity goes to spot market cheap. Demand surges? Spot capacity shrinks, prices spike. Dec 2025: H100 spot doubled during finals season.

Spot is dynamically priced. Prices fluctuate hourly, sometimes minute-by-minute. A $1 spot GPU might cost $2 the next hour.

Interruption Guarantees

Spot instances can be reclaimed on short notice:

AWS EC2 Spot: 2-minute warning
GCP Preemptible: 30-second warning
RunPod Spot: 5-minute warning (highest notice window)
Azure Spot: 30-second warning

Longer warning windows (RunPod) allow graceful shutdown of workloads. Shorter windows (GCP) force immediate termination.

Provider Comparison

RunPod Spot Market

Best for: Batch inference, training with checkpoints, fault-tolerant workloads.

Pricing: 50-65% discount on on-demand rates (deeper discounts than AWS, more stable than GCP).

Interruption rate: A100 (2-5% hourly), H100 (8-12% hourly), H200 (12-15% hourly).

Notice period: 5 minutes (industry best, allows graceful shutdown).

Availability: Stable for older GPUs (RTX 3090, A100), volatile for newer (H100, H200). During high-demand periods (research conferences, end of quarter), spot prices spike and availability plummets.

Infrastructure Support:

API for spot price history (7-day rolling)
Price alerts via webhook
Auto-switching between spot and on-demand (no code changes)
Pre-built checkpoint integration with S3

Cost-Benefit Example: Fine-tuning a 7B Model

On-Demand Scenario:

Hardware: 1x RunPod A100 PCIe
Training time: 8 hours
Cost: 8 × $1.19 = $9.52
Guarantee: No interruptions, predictable cost

Spot Scenario (Conservative Estimate):

Base cost: 8 hours × $0.42 = $3.36
Expected interruptions: 2 (at 5% hourly = 0.05 × 8 = 0.4 interruptions expected, round to 1-2)
Resume overhead: 2 interruptions × 4 hours overhead = 8 hours
Total compute: 8 + 8 = 16 hours
Total cost: 16 × $0.42 = $6.72

Comparison:

On-demand: $9.52 (safe)
Spot: $6.72 (64% discount despite interruptions)

Scaling Impact: At 10 fine-tuning jobs/month:

On-demand: $95.20/month
Spot: $67.20/month
Monthly savings: $28/month

At 100 fine-tuning jobs/month:

On-demand: $952/month
Spot: $672/month
Monthly savings: $280/month

At 1,000 jobs/month (large team):

On-demand: $9,520/month
Spot: $6,720/month
Monthly savings: $2,800/month or $33,600/year

AWS EC2 Spot

Best for: Large-scale training, parallel jobs, fault-tolerant infrastructure with custom failure handling.

Pricing: 50% discount (fixed percentage, less dynamic than RunPod, more predictable).

Interruption rate: Variable by instance type, region, and time of day. p4d instances (A100): 5-10% hourly. p5 instances (H100): 10-15% hourly. Rates are published in AWS console and can be monitored in real-time.

Notice period: 2 minutes (industry worst, forces immediate termination).

Availability: Highly available in some regions (us-east-1, eu-west-1), scarce in others (us-west-2, ap-southeast-1). Multi-region failover is necessary for production.

Spot Price Volatility: AWS spot prices fluctuate based on regional demand. During research conferences (NeurIPS, ICML), H100 spot prices can spike 2-3x. Planning ahead (book during off-season) is critical.

Cost-Benefit Example: Training a 70B Model

On-Demand Cluster:

Hardware: 8x AWS p5 (H100) instances
Cost per instance: $6.88/hour
Cluster cost: 8 × $6.88 = $55.04/hour
Training duration: 10 days (240 hours)
Total cost: 240 × $55.04 = $13,209.60

Spot Cluster (with interruption handling):

Spot price: 50% discount = $3.44/hour per instance
Base cluster cost: 8 × 3.44 = $27.52/hour
Expected interruptions: 10% hourly × 240 hours = 24 interruptions
Resume overhead: 24 interruptions × 2 hours (checkpoint loading + data reloading) = 48 hours
Total compute: 240 + 48 = 288 hours
Total cost: 288 × $27.52 = $7,925.76

Comparison:

On-demand: $13,209.60 (guaranteed completion)
Spot: $7,925.76 (40% savings, but 48 hours of extra overhead)

Hidden Costs (AWS Spot):

Engineering time to implement spot restart logic: 40 hours = $4,000
Monitoring and alerting infrastructure: $500/month
Extra data transfer costs during frequent interruptions: ~$100
Total operational overhead: $4,600

True Cost-Benefit:

Spot with overhead: $7,926 + $4,600 = $12,526 (5% cheaper than on-demand)
But adds engineering complexity and operational risk
Break-even is at 2-3 training runs (amortize engineering cost)

AWS spot is only cost-effective at 100+ GPU-hour scale WITH dedicated engineering resources.

GCP Preemptible Instances

Best for: Batch jobs, data processing, embarrassingly parallel tasks, jobs with no state.

Pricing: 70% discount (deepest in industry, 20% deeper than RunPod).

Interruption rate: 10-20% hourly (highest in industry, 2x higher than AWS spot).

Notice period: 30 seconds (immediate termination, forces hard stops).

Availability: Spotty by region. Some regions have no preemptible inventory. Availability varies wildly by time of day.

Strengths:

Deepest discounts (70% vs 50-65% elsewhere)
Committed discounts available (30% off on-demand for 1-year commitment)
Integration with Google Cloud AI Platform (automatic experiment retries)

Weaknesses:

30-second warning is too aggressive for checkpoint-based workloads
Availability in many regions is poor
Interruption rate is volatile (10-20% is wide range)

Cost-Benefit Example: Data Processing Pipeline

Task: Process 1TB of unstructured data (extract features, tokenize, normalize).

On-Demand Setup:

Hardware: 4x GCP A100 80GB (on-demand)
Cost per instance-hour: $5.07 (GCP pricing)
Cluster cost: 4 × $5.07 = $20.28/hour
Processing time: 50 hours (data is split into 4 parallel jobs)
Total cost: 50 × $20.28 = $1,014

Preemptible Setup:

Spot price: 70% discount = $1.52/hour per instance
Base cost: 4 × 1.52 = $6.08/hour
Expected interruptions: 15% hourly × 50 hours = 7.5 interruptions (assume 8)
Resume overhead: 8 interruptions × 1 hour (restart job, reload data) = 8 hours
Total compute: 50 + 8 = 58 hours
Total cost: 58 × $6.08 = $352.64

Comparison:

On-demand: $1,014 (guaranteed)
Preemptible: $352.64 (65% savings)
Net savings: $661.36

Suitability: GCP preemptible wins decisively for stateless batch jobs. The 30-second warning is only a problem if the job requires graceful shutdown. For "kill and restart" workloads, GCP is unbeatable.

Vast.AI (Spot-Only Provider)

Best for: Short batch jobs, one-off training, price-sensitive experimentation.

Pricing: Varies by seller. Typical: 40-60% below on-demand.

Interruption rate: Seller-dependent. 1-15% hourly.

Notice period: 5-10 minutes.

Availability: Highly dynamic, pricing changes hourly.

Trade-off: More GPUs available (sourced from individuals + providers), but less reliable than production spot. GPUs can be withdrawn without warning if the owner recalls them.

Reliability and Interruption Rates

GPU Model Reliability Tiers

Tier 1 (Stable, <5% hourly interruption):

RTX 3090
A100 PCIe and SXM
L40 and L40S

These are popular but older. Providers have deep inventory, spot capacity is plentiful.

Tier 2 (Moderate, 5-10% hourly):

H100 PCIe
H100 SXM
RTX 4090

These are current-generation. Demand is high, spot capacity is tight. Interruption spikes during high-demand periods (research quarters, month-end).

Tier 3 (Volatile, >10% hourly):

H200
B200
GH200

These are bleeding-edge. Capacity is scarce, spot spots out frequently. New models often have <1-month supply available on spot markets.

Estimating Real Interruption Cost

Interruption cost is not just spot price × extra hours. It's cost + lost progress.

Example: Training a model with 1-hour checkpoints.

On-demand 10-hour job: 10 hours, 10 checkpoints, $11.90 (at $1.19/hr A100).

Spot with 5% hourly interruption: Expected interruptions = 10 × 0.05 = 0.5 (statistically, one 50% chance of interruption).

Worst case: Interruption at hour 9, resume from hour 8 checkpoint, re-run hours 8-10 (3 extra hours).

Cost: 13 hours × $0.42 = $5.46.

Savings: $6.44 (46% cheaper), plus time risk (3-hour resume time).

Expected value favors spot. But variance is high. In 1 in 100 runs, 5+ interruptions could occur, wiping out savings.

Workload Matching

Ideal for Spot (High Savings)

Batch inference: Process a dataset overnight. Interruption is acceptable; re-run failed batches in the morning.

Savings: 50-70% × 10,000 hours/month = $50-70k/month on RunPod H100.

Training with checkpoints: Pre-training a model with hourly saves to S3. Resume from latest checkpoint on interruption.

Savings: 50-65% × 2,000 hours/month = $25-65k/month on RunPod A100.

Data processing: Extract features, transform data, ETL pipelines. Re-run failed jobs (idempotent).

Savings: 60-70% × 5,000 hours/month = $30-70k/month on RunPod L40S.

Parallel experiments: Run 1,000 hyperparameter searches in parallel. Individual jobs fail, that's fine; enough succeed for statistical significance.

Savings: 55% × 3,000 hours/month = $22-33k/month on RunPod A100.

Unsuitable for Spot (Use On-Demand)

Real-time serving: Chat API serving users. Interruptions cause downtime, lose customers.

Interactive workloads: Jupyter notebooks, development, debugging. Manual re-runs are costly.

Time-sensitive inference: API SLA requires <100ms latency, 99.9% uptime. Spot can't guarantee.

Long-running jobs without checkpoints: Pre-training without saves. Interruption at hour 100 means restart from hour 0. Unacceptable.

Cost Savings Analysis

Scenario 1: Small Research Lab (500 GPU-hours/month)

Budget for 500 hours of A100 fine-tuning.

On-demand: 500 × $1.19 = $595/month

Spot (55% discount, 1-2 interruptions expected):

Base: 500 × $0.42 = $210
Interruptions: 2 interruptions × 10 hours/resume = 20 extra hours = $8.40
Total: $218.40

Savings: $376.60 (63% total reduction)

Effort: Set up checkpointing, automated resumption (1-2 hours one-time engineering).

Scenario 2: Startup Production Inference (100k GPU-hours/month)

Budget for continuous H100 inference serving.

On-demand: 100,000 × $2.69 = $269,000/month

Spot with redundancy:

Primary: 50,000 hours spot H100 at $1.29/hr = $64,500
Fallback (on-demand for interruptions): 10,000 hours on-demand = $26,900
Total: $91,400

Savings: $177,600 (66% reduction)

Trade-off: Requires multi-region failover, real-time load balancing, 2-3 months engineering.

Scenario 3: Production Training (2,000 GPU-hours/month)

Budget for continuous model training.

On-demand: 2,000 × $2.69 (H100 SXM) = $5,380/month

Spot (52% discount, 3-5 interruptions expected):

Base: 2,000 × $1.29 = $2,580
Extra compute (10% buffer): 200 hours = $258
Total: $2,838

Savings: $2,542 (47% reduction)

Effort: Checkpoint every 30 minutes, automated resumption.

Best Practices

Practice 1: Hybrid Spot + On-Demand

Don't go all-in on spot. Mix 70% spot + 30% on-demand.

Spot GPUs for batch and training (fault-tolerant)
On-demand for serving and interactive workloads

Average discount: 40% (70% × 55% + 30% × 0%) vs 55% all-spot (but more reliable).

Practice 2: Checkpoint Every N Minutes

Training without checkpoints on spot is a waste. Checkpoints should be frequent enough that no single interruption loses more than 1% of progress.

Rule: Checkpoint interval = 1% of expected job duration.

10-hour job: checkpoint every 6 minutes. 100-hour job: checkpoint every 60 minutes.

Practice 3: Multi-Region Spot Fallback

High-demand regions (us-east-1, us-west-1) have high interruption rates. Use lower-demand regions as fallback.

Deploy priority:

us-west-2 spot (cheaper, higher interrupt)
eu-west-1 spot (fallback)
us-east-1 on-demand (final fallback)

This maximizes savings while minimizing interruption impact.

Practice 4: Price Monitoring

Spot prices fluctuate hourly. Automated bidding strategies can help:

Monitor spot price history over past 7 days
Bid 10% below the 7-day average
Auto-trigger on-demand fallback if spot goes above average + 50%

RunPod's API supports price alerts; use them.

Practice 5: Consolidate Smaller Jobs

Running 100 one-hour jobs is riskier than running 10 ten-hour jobs on spot. Each job has independent interruption probability. Consolidation reduces variance.

100 × 1-hour jobs: ~5 expected interruptions. 10 × 10-hour jobs: ~5 expected interruptions, but same total compute (lower resume overhead).

Batch smaller jobs when possible.

FAQ

Is spot worth the engineering overhead?

Yes, if monthly GPU spend >$5,000. The break-even is 2-3 months of engineering time (which costs $20-40k in salaries). At $10k/month savings, ROI is 2-4 months.

Below $5k/month, on-demand is simpler.

Can I use spot for inference at scale?

Only with multi-region failover and load balancing. A single spot H100 serving 100 requests/sec will cause cascading failures if interrupted. With 3 spot H100s + 1 on-demand fallback, you can absorb 1-2 interruptions without user impact.

Cost: 3 × $1.29 + 1 × $2.69 = $6.56/hr (vs 4x on-demand at $10.76/hr). Still 39% cheaper.

What's the worst-case interruption scenario?

RunPod H100: 12% hourly interruption × 730 hours/month = 87 expected interruptions. But they're not evenly distributed. You might have 2 weeks with no interruptions, then 3 in one hour. Variance is high.

Plan for 3-5x the expected rate during peak-demand periods.

Should I combine spot instances into a single cluster?

No. A single 8-GPU cluster is all-or-nothing. If one GPU is interrupted, the entire cluster stops (synchronization barrier). Better to run 8 independent single-GPU spot jobs + combine results.

For distributed training, use spot with fault-tolerance built in (Horovod, DeepSpeed).

How do I handle multi-hour jobs on volatile spot markets?

Increase checkpoint frequency to 10-15 minutes. Expected loss from interruption: 15 minutes of compute (negligible). The trade-off is storage: more checkpoints = more S3 API calls ($0.0004/checkpoint, negligible).

At 1,000 checkpoints/month per job, storage costs ~$0.40.

What's the spot price in my region?

RunPod publishes live pricing on runpod.io/pricing. AWS spot pricing is available on ec2instances.info (third-party tracker). GCP preemptible rates are fixed per-region.

Prices change hourly. Check before starting long-running jobs.

Sources

RunPod Pricing and Spot Market
AWS EC2 Spot Instance Pricing
GCP Preemptible VM Pricing
Azure Spot Virtual Machine Pricing
Vast.ai GPU Pricing
EC2 Instances Info (Spot Price Tracker)
DeployBase GPU Pricing Tracker (data observed March 22, 2026)

Contents

Spot GPU Pricing: Overview

Spot vs On-Demand Pricing

Hourly Rate Comparison (as of March 2026)

Understanding Spot Pricing Mechanics

Why Spot Is Discounted

Interruption Guarantees

Provider Comparison

RunPod Spot Market

AWS EC2 Spot

GCP Preemptible Instances

Vast.AI (Spot-Only Provider)

Reliability and Interruption Rates

GPU Model Reliability Tiers

Estimating Real Interruption Cost

Workload Matching

Ideal for Spot (High Savings)

Unsuitable for Spot (Use On-Demand)

Cost Savings Analysis

Scenario 1: Small Research Lab (500 GPU-hours/month)

Scenario 2: Startup Production Inference (100k GPU-hours/month)

Scenario 3: Production Training (2,000 GPU-hours/month)

Best Practices

Practice 1: Hybrid Spot + On-Demand

Practice 2: Checkpoint Every N Minutes

Practice 3: Multi-Region Spot Fallback

Practice 4: Price Monitoring

Practice 5: Consolidate Smaller Jobs

FAQ

Related Resources

Sources