RunPod GPU Pricing: 2026 Comprehensive Pricing Guide

Deploybase · March 10, 2026 · GPU Pricing

Contents


RunPod GPU Pricing: Overview

RunPod GPU pricing as of March 2026 ranges from $0.22/hr (RTX 3090 spot) to $5.98/hr (B200 spot). RunPod's strength is its spot pricing tier (preemptible instances) that undercuts fixed-price competitors like Lambda and CoreWeave. The fundamental tradeoff: spot instances can be interrupted without warning (typically <5 minute notice). Teams tolerating interruptions save 30-60%. Teams requiring continuous uptime pay premium rates for on-demand or reserved instances.

RunPod operates the largest distributed GPU network (as of March 2026): hardware from individual miners, small data centers, and cloud providers. This diversity keeps prices low but introduces variance in performance and reliability.


RunPod Pricing By GPU Model

GPUVRAMSpot/hrOn-Demand/hrMonthly (Spot, 730 hrs)Annual (Spot)
RTX 309024GB$0.22$0.39$160$1,920
RTX 409024GB$0.34$0.60$248$2,976
L424GB$0.44$0.78$321$3,856
L4048GB$0.69$1.22$503$6,036
RTX 509032GB$0.69$1.22$503$6,036
L40S48GB$0.79$1.40$577$6,924
RTX PRO 600096GB$0.89$1.57$649$7,788
A100 PCIe80GB$1.19$2.10$869$10,425
A100 SXM80GB$1.39$2.45$1,014$12,162
H100 PCIe80GB$1.99$3.51$1,453$17,436
H100 SXM80GB$2.69$4.76$1,964$23,568
H200141GB$3.59$6.35$2,621$31,452
B200192GB$5.98$10.57$4,366$52,392

Data as of March 2026. On-demand rates are 40-75% premium over spot. Spot pricing fluctuates ±15% depending on supply and demand (shown prices are typical March 2026 rates).


Consumer GPUs ($0.22-$0.79/hr Spot)

RTX 3090 and RTX 4090

RTX 3090 ($0.22/hr spot, 24GB GDDR6X) and RTX 4090 ($0.34/hr spot, 24GB GDDR6X) are consumer-grade high-end GPUs. VRAM capacity sufficient for single-model inference (7B-13B parameter models). Training is possible but slow (2.5-3x slower than A100 due to GDDR6X bandwidth bottleneck vs HBM2e).

Ideal for: hobby projects, prototyping, inference-only workloads, students running LLM experiments, budget-conscious teams. Monthly cost at spot rates: $160-248 (RTX 3090 is extremely cheap). Annual: $1,920-$2,976. Breakeven against home RTX 4090 purchase ($1,800 retail): ~6 months of continuous use.

Limitations: GDDR6X bandwidth (1,008 GB/s RTX 4090) is roughly 2x slower than HBM2e (1,935 GB/s A100). Training large models is impractical. Thermal management is tight on RTX 4090 under sustained load (250-320W typical, 450W peak). Spot interruption risk is high during demand spikes (news events, model releases).

Best for: researchers, hobbyists, prototyping. Avoid for production workloads.

L4 and L40 Series

L4 ($0.44/hr spot, 24GB GDDR6) is NVIDIA's newer inference-focused GPU (2023). Energy-efficient for LLM serving. L40 ($0.69/hr spot, 48GB GDDR6) and L40S ($0.79/hr spot, 48GB GDDR6) add more memory and improved throughput.

GDDR6 memory (vs consumer GDDR6X) prioritizes power efficiency. Bandwidth: 576 GB/s (L4, L40) to 864 GB/s (L40S). Better than consumer GPUs but still below data center HBM standards.

Best for: batch inference, image processing pipelines, video transcoding, lightweight fine-tuning, cost-sensitive inference deployment. Monthly cost at spot: $321-577. Scale to 2-4x instances for high-throughput workloads. Annual: $3,856-$6,924.

RTX 5090 and RTX PRO 6000

RTX 5090 ($0.69/hr spot, 32GB GDDR7) is NVIDIA's latest consumer flagship (2025). More memory and faster bandwidth than RTX 4090. RTX PRO 6000 ($0.89/hr spot, 96GB GDDR6) is a professional visualization GPU with massive VRAM. Legacy architecture (Turing, 2018) but 96GB capacity is compelling for inference at scale.

RTX 5090 targets creators and AI enthusiasts. RTX PRO 6000 is rarely used for AI; when it is, it's for serving 70B models at single-batch inference with aggressive quantization. Monthly cost: $503-$649 spot.


Data Center Standard GPUs ($1.19-$1.99/hr Spot)

A100 PCIe and SXM ($1.19-$1.39/hr Spot)

A100 PCIe ($1.19/hr spot) and A100 SXM ($1.39/hr spot) are the practical baseline for production AI workloads. HBM2e memory: 80GB. Bandwidth: 1,935 GB/s. Proven architecture (2020). No surprises. Mature ecosystem.

PCIe variant fits heterogeneous server builds (mixed CPU/GPU setups). Single-GPU inference and fine-tuning. SXM variant supports NVLink (600 GB/s per GPU) for multi-GPU training with lower communication overhead. Cost difference: $0.20/hr spot (17% premium for SXM).

A100 is 2-3x slower than H100 but costs 40% less per hour (spot pricing). For inference, fine-tuning, and training models under 30B parameters, A100 spot is the cost-effective default.

Monthly spot cost: $869-1,014. Annual: $10,425-$12,162. Breakeven analysis: A100 ($1.19/hr) vs H100 ($1.99/hr). A100 slower (2.8x). Cost difference: $0.80/hr. Task completion time saves: H100 finishes in 1/2.8 the time. Cost-per-task: depends on throughput value, but H100 often wins despite higher hourly rate.

Training vs Inference on A100

Training 13B model:

  • A100: 15 hours, $17.85
  • H100: 5 hours, $9.95
  • H100 wins (40% cheaper)

Serving inference continuously:

  • A100 throughput: 280 tok/s, $1.19/hr = $4.25 per 1M tokens
  • H100 throughput: 850 tok/s, $1.99/hr = $2.34 per 1M tokens
  • H100 wins (45% cheaper per token)

High-Performance Training GPUs ($1.99-$5.98/hr Spot)

H100 PCIe and SXM ($1.99-$2.69/hr Spot)

H100 PCIe ($1.99/hr spot) and H100 SXM ($2.69/hr spot) are modern training and inference standards. 80GB HBM3 memory. Bandwidth: 3,350 GB/s (1.7x A100). Tensor cores tuned for FP8 (quantization) and TF32 (transformer operations).

PCIe variant ($1.99/hr) is 27% cheaper than SXM. Single-GPU inference and fine-tuning. SXM variant ($2.69/hr) adds NVLink (900 GB/s per GPU) for distributed training. Multi-GPU clusters (8x H100 SXM) scale efficiently without bottleneck.

H100 is 3x faster than A100 on most workloads. Cost-per-task often favors H100 despite higher hourly rate. Training or inference workloads benefit from H100's speed advantage.

Fine-tuning a 7B model (100K examples):

  • A100: 20 hours × $1.19 = $23.80
  • H100: 7 hours × $1.99 = $13.93
  • H100 saves 41% in absolute cost

Monthly spot cost: $1,453-1,964. Annual: $17,436-$23,568.

H200 ($3.59/hr Spot)

H200 launched late 2025. 141GB HBM3e memory (75% more than H100). Bandwidth: 4.8 TB/s (43% faster). $3.59/hr spot (1.8x H100 PCIe cost).

Purpose: models requiring >80GB VRAM and dense batch inference. LLaMA 405B quantized to 4-bit needs ~100GB; H200 handles it comfortably. Most teams don't need H200 yet. Cost-per-throughput still favors H100 for models fitting in 80GB.

Monthly spot: $2,621. Annual: $31,452.

B200 ($5.98/hr Spot)

B200 is NVIDIA's newest data center GPU (late 2025). 192GB memory. 20.4 sparsity-adjusted TFLOPS. $5.98/hr spot. 3x H100 PCIe cost.

Intended for training frontier 200B+ parameter models and dense inference at massive scales. Few teams have deployed B200 workloads at scale yet. Pricing reflects scarcity and high demand. Early adopter premium.

Monthly spot: $4,366. Annual: $52,392.


Multi-GPU Configurations

RunPod supports multi-GPU instances with NVLink interconnect for SXM variants:

ConfigurationCountTotal VRAMSpot/hrPer-GPU Spot/hr
A100 SXM 2x2x160GB$2.78$1.39
A100 SXM 4x4x320GB$5.56$1.39
A100 SXM 8x8x640GB$11.12$1.39
H100 SXM 2x2x160GB$5.38$2.69
H100 SXM 4x4x320GB$10.76$2.69
H100 SXM 8x8x640GB$21.52$2.69
B200 8x8x1,440GB$47.84$5.98

NVLink overhead (interconnect) is minimal (< $0.05/GPU/hr). Multi-GPU pricing scales linearly: no bulk discounts beyond per-unit rate. RequestN GPUs and pay N × base rate.


Spot vs On-Demand Pricing

Spot instances (Preemptible):

  • Lower pricing (40-60% discount vs on-demand)
  • Subject to interruption
  • Typical notice: 5 minutes
  • Automatic restart: RunPod restarts job, not user action
  • Best for: training, batch jobs, fault-tolerant workloads

On-Demand instances (Guaranteed):

  • Higher pricing (40-75% premium)
  • No interruptions
  • Ideal for: production inference, continuous serving, low-latency requirements

Example: A100 PCIe pricing

  • Spot: $1.19/hr → $869/month → $10,425/year
  • On-Demand: $2.10/hr → $1,533/month → $18,396/year
  • Difference: $664/month or $8,100/year for guaranteed uptime

For training: use spot. Checkpointing every 30 minutes handles interruptions. Savings compound over weeks of training. For inference: use on-demand. Interruptions break user sessions, unacceptable for production.


Cost Per Task Analysis

Single Fine-Tuning (7B Model, LoRA, 100K Examples)

Training time: 20 hours on A100, 6 hours on H100.

A100 PCIe (Spot):

  • 20 hrs × $1.19 = $23.80

H100 PCIe (Spot):

  • 6 hrs × $1.99 = $11.94

Savings: H100 is 50% cheaper per task due to 3x speed advantage exceeding hourly cost premium.

On-Demand comparison:

  • A100: 20 hrs × $2.10 = $42
  • H100: 6 hrs × $3.51 = $21.06
  • H100 still 50% cheaper

Continuous Inference (24/7, 5M Tokens/Day)

Serving a 70B model. Throughput requirement: 5M tokens/day.

A100 setup: 3x A100 PCIe (spot)

  • Monthly: 3 × $869 = $2,607
  • Utilization: 70% (practical, off-peak downtime)
  • Throughput: 3 × 280 tok/s = 840 tok/s
  • Time to serve 5M tokens: 5,952 seconds = 1.65 hours
  • Actual run time per day: 2.35 hours (accounting for batch processing windows)

H100 setup: 1x H100 PCIe (spot)

  • Monthly: $1,453
  • Throughput: 850 tok/s
  • Time to serve 5M tokens: 5,882 seconds = 1.63 hours
  • Actual run time per day: 2.33 hours

H100 is 44% cheaper monthly while handling throughput with fewer GPUs.

Development and Experimentation

Testing a new fine-tuning approach. 50 experimental runs, 2 hours each = 100 GPU-hours.

Using A100 PCIe (Spot):

  • 100 hrs × $1.19 = $119
  • Slow iteration cycle (2 hours per experiment)

Using H100 PCIe (Spot):

  • 50 hrs (3x faster) × $1.99 = $99.50
  • Fast iteration cycle (40 minutes per experiment)

H100 is slightly cheaper (17% savings) AND enables faster experimentation. Development velocity matters for team productivity.


Pod Provisioning and Setup

RunPod pods (GPU instances) are provisioned within 2-5 minutes. Select GPU type, region, and Docker image. RunPod manages the hardware selection (balancing demand, availability).

POD provisioning workflow:

  1. Select GPU type from dashboard
  2. Choose on-demand or spot tier
  3. Select Docker image (standard offerings or custom)
  4. Click "Rent"
  5. Wait 2-5 minutes for provisioning
  6. SSH access + JupyterLab available
  7. SSH into instance (e.g., ssh runpod@123.456.789.0)

Setup time is faster than Lambda (5-10 min on Lambda vs 2-5 min on RunPod due to larger hardware pool). Spot instances have longer wait times during demand spikes (can be 30+ minutes if insufficient spot capacity).


Distributed Network of Providers

RunPod's strength is its distributed network. GPUs come from multiple sources: individual miners, small hosting companies, large data centers. This diversity keeps prices low but introduces performance variance.

Hardware variability: RTX 4090 from miner A (home office, 300W power budget) behaves differently than RTX 4090 from data center provider B (redundant power, cooling). Network latency to miner A (residential ISP) may be higher than data center B (production backbone).

For single-GPU inference/training, variance is acceptable. For distributed multi-GPU clusters, hardware consistency matters (NVLink synchronization assumes homogeneous GPUs). RunPod's network is best suited for single-GPU workloads. Multi-GPU training uses cloud provider's own data centers (homogeneous hardware).


Serverless Inference (RunPod Serverless)

Beyond per-pod rental, RunPod offers serverless inference endpoints. Upload model, configure endpoint, pay per-request. No hourly charges, only per-token/per-request cost.

RunPod Serverless pricing: $0.01-$0.30 per million tokens depending on model size and GPU tier. Lower overhead than renting pod 24/7 for low-traffic inference (<1M tokens/day).

For chatbots, APIs with variable load, serverless is cost-optimized. For continuous batch processing, per-pod rental is cheaper.


FAQ

What is the difference between Spot and On-Demand?

Spot is preemptible (can be interrupted without notice). On-Demand is guaranteed (no interruptions). Spot is 40-60% cheaper. On-Demand is 40-75% more expensive. Use Spot for training (checkpoints handle interruptions). Use On-Demand for serving (user-facing services).

How often do instances get interrupted?

Depends on supply and demand. During high-demand periods (new model releases, AI conference season, news events), spot instances interrupt more frequently (hours between restarts or multiple interruptions per day). During low-demand periods, interruptions are rare (days or weeks between interruptions). RunPod publishes interrupt rates per GPU model on its dashboard.

Can I reserve spot capacity for a discount?

RunPod offers reserved instances (24/7 guaranteed, discounted). Contact sales for pricing. Standard spot rates are shown here. Reserved capacity typically costs 20-30% less than on-demand.

Is RunPod cheaper than Lambda?

Yes. Spot tiers are 30-50% cheaper than Lambda's fixed pricing (A100: RunPod $1.19 spot vs Lambda $1.48 on-demand). On-Demand tiers are comparable or slightly cheaper than Lambda. Choose RunPod for cost optimization. Choose Lambda for simplicity (no preemption, instant provisioning).

How do I set up a multi-GPU cluster?

RunPod provides pod templates for distributed training frameworks (PyTorch DDP, DeepSpeed, Hugging Face Transformers). Select the cluster configuration (2x, 4x, 8x H100 SXM) and launch. NVLink setup is automatic.

Can I use RunPod for production inference?

Yes, but use On-Demand instances (guaranteed uptime). Spot instances are unsuitable for user-facing services (interruptions will break user sessions). On-Demand pricing is competitive with Lambda ($2.10 vs Lambda $1.48 for A100). Risk: RunPod's distributed hardware network may introduce latency variance compared to Lambda's centralized data centers.

What about sustained-use discounts?

No automatic sustained-use discounts (unlike AWS or GCP). Discounts require volume commitments or reserved capacity contracts (contact sales). Standard spot/on-demand pricing applies otherwise.

Can I access Docker containers?

Yes. RunPod supports custom Docker images. Pull from Docker Hub or upload your own Dockerfile. Standard containerized workflow. Pre-built images for PyTorch, TensorFlow available.

How do I troubleshoot a stuck training job?

RunPod provides SSH + JupyterLab access. Monitor GPU utilization with nvidia-smi. Logs accessible in pod dashboard. Standard debugging workflow. Kill pod and restart from checkpoint if hung.

Does RunPod offer a free tier?

RunPod offers $25 free credits to new accounts (as of March 2026). Covers ~100 hours on RTX 4090, or ~10 hours on H100 PCIe. Good for testing and prototyping. No monthly recurring free tier after initial credits.

What is RunPod's uptime SLA?

RunPod does not publish SLA for spot instances (by definition, preemptible). On-Demand instances have best-effort uptime (no formal guarantee). Typically 99.5%+ uptime on On-Demand, but not contractually guaranteed.

Can I use RunPod for multi-region training?

RunPod offers multiple regions (US East, US West, EU). No native multi-region cluster support (would introduce multi-region latency). Single-region is required for NVLink training. Stitch regions together with custom orchestration (out-of-scope for RunPod platform).



Sources