Lambda Cloud GPU Pricing 2026: Complete Cost Guide

Lambda Cloud GPU Pricing Overview
Pricing Comparison Table
Single-GPU Pricing Breakdown
Multi-GPU Cluster Pricing
Reserved Instances and Discounts
Storage and Networking Costs
Cost Estimation by Workload
Comparison with Competitors
Cost Optimization Strategies
FAQ
Related Resources
Sources

Lambda Cloud GPU Pricing Overview

Lambda Cloud GPU pricing ranges from $0.58 per GPU-hour to $6.08 per GPU-hour on-demand as of March 2026. The spread depends on GPU model, cluster size, and VRAM. A Quadro RTX 6000 starts at the low end. B200 SXM at the high end. Most teams running inference land between $1.48 and $2.86 per hour.

Lambda positions itself in the mid-market tier. Not the cheapest boutique provider. Not an expensive hyperscaler. Consistent uptime, documented SLAs, and production-grade API access matter more than squeezing out absolute lowest hourly rates. Pricing reflects that positioning.

Full GPU inventory tracked on the DeployBase GPU comparison.

Pricing Comparison Table

GPU Model	VRAM	Price/GPU-hr	Monthly (730 hrs)	Use Case
Quadro RTX 6000	24GB	$0.58	$423.40	Legacy professional graphics
NVIDIA A10	24GB	$0.86	$627.80	Small inference, LoRA fine-tuning
RTX A6000	48GB	$0.92	$671.60	Rendering, visualization, fine-tuning
A100 PCIe	40GB	$1.48	$1,080.40	Mid-tier training, inference
A100 SXM	40GB	$1.48	$1,080.40	Distributed training (multi-GPU)
GH200	96GB	$1.99	$1,452.70	Large model inference, long context
H100 PCIe	80GB	$2.86	$2,087.80	High-throughput inference, single GPU
H100 SXM	80GB	$3.78	$2,759.40	Large model training, multi-GPU clusters
B200 SXM	192GB	$6.08	$4,438.40	Frontier model training, maximum memory

All prices from Lambda's official pricing page, observed March 21, 2026.

Single-GPU Pricing Breakdown

Budget Tier ($0.58-$0.92/hr)

The Quadro RTX 6000 starts at $0.58/hr. Legacy GPU with 24GB GDDR6 memory. Designed for professional visualization and CAD workloads, not deep learning. Rarely the right choice for AI compute, but it's the cheapest entry point on Lambda. Infrastructure is outdated, which explains the low cost.

More practical: the A10 at $0.86/hr. 24GB GDDR6, respectable for small fine-tuning jobs and inference at single-digit batch sizes. Training from scratch at this tier is slow (140 peak TFLOPS vs 660 for H100). Inference with smaller models (7B parameters or less) works fine. Throughput is ~20-30 tokens/sec for a 7B model.

RTX A6000 at $0.92/hr adds 48GB for $0.06 more per hour. Same memory technology as A10 (GDDR6 not HBM), but double the VRAM. Opens space for slightly larger models or higher batch sizes without model parallelism. Training on this tier is still slow for anything above 7B parameters.

Mid-Tier ($1.48-$1.99/hr)

The A100 PCIe and A100 SXM both sit at $1.48/hr on Lambda. 40GB HBM2e memory. Same hourly cost, different form factors.

PCIe variant: Uses standard memory bus (2.0 TB/s bandwidth). Single-GPU inference is fast enough. Multi-GPU training hits bandwidth limits. Good for single-GPU deployments and small multi-GPU clusters where PCIe is a bottleneck teams can tolerate.

SXM variant: Uses NVLink-ready chassis but Lambda prices them identically. If building a multi-GPU cluster, SXM is the right choice. The form factor is designed for rack deployment with high-speed interconnects. For a single A100 SXM, cost-wise, PCIe is equivalent.

The GH200 at $1.99/hr jumps to 96GB HBM3e memory. That 2.4x increase in VRAM for 34% more cost per hour is a practical trade-off. Large language models in the 13B-70B parameter range fit comfortably without quantization. Good for inference serving on large models, worse for training due to lower peak compute relative to H100 series (540 peak TFLOPS vs 660 for H100).

High Performance ($2.86-$3.78/hr)

H100 PCIe at $2.86/hr. 80GB HBM2e, 2.0 TB/s bandwidth, 350W TDP. Most common H100 variant on cloud providers because it fits standard server builds. Inference throughput is excellent (~100-150 tokens/sec for a 70B model). Multi-GPU training hits memory bandwidth limits when scaling beyond 2-4 GPUs. Single-GPU training is viable for models up to 70B parameters with some quantization.

H100 SXM at $3.78/hr is the distributed training workhorse. Same 80GB VRAM, but 3.35 TB/s memory bandwidth and NVLink 4.0 interconnect (900 GB/s per GPU). Eight-GPU clusters reach 57.6 TB/s aggregate bandwidth. Essential for training 70B+ models or large model parallelism. The SXM premium ($0.92/hr more than PCIe) reflects the NVLink interconnect advantage for multi-GPU deployments.

Real-world scenario: training a 70B parameter LLM on 8x H100 SXM takes 7-10 days depending on data size. Training the same model on 8x H100 PCIe would bottleneck on memory bandwidth and take 25-40% longer.

Frontier ($6.08/hr)

B200 SXM at $6.08/hr. 192GB HBM3e memory. NVIDIA's newest generation (late 2024). Single GPU holds models up to 140B parameters without quantization. Training at this scale was previously impossible without multi-GPU parallelism and associated complexity. Lambda offers it on the highest-spec SXM chassis. Limited availability. High cost. But if a team needs single-GPU training for very large models, it's the option.

Monthly cost at continuous use (730 hours): $4,438.40. Only justifiable for specialized workloads with specific hardware requirements. Typical customer is a research org or large company training custom models.

Multi-GPU Cluster Pricing

Lambda's multi-GPU pricing is listed for A100, H100, and B200. Each configuration includes all required networking and NVLink infrastructure.

A100 Clusters

Configuration	VRAM Total	Price/hr	Price/GPU
1x A100 PCIe	40GB	$1.48	$1.48
2x A100 PCIe	80GB	$2.96	$1.48
4x A100 PCIe	160GB	$5.92	$1.48
1x A100 SXM	40GB	$1.48	$1.48
8x A100 SXM	320GB	$11.84	$1.48
8x A100 SXM	640GB	$16.48	$2.06

The 8x A100 SXM cluster at 320GB costs $11.84/hr ($1.48 per GPU). That's exactly the per-GPU rate of a single 1x. Most providers charge 10-15% more for clustered GPUs due to networking, chassis, and NVLink overhead. Lambda's pricing suggests the bulk rate absorbs overhead, making the 8-GPU cluster cost-neutral on a per-GPU basis compared to spinning up eight 1x instances separately.

The 8x A100 SXM at 640GB is a different configuration (likely higher memory variant or additional networking), costs $16.48/hr ($2.06 per GPU). More expensive per-GPU because it's a specialized cluster configuration.

H100 Clusters

Configuration	VRAM Total	Price/hr	Price/GPU
1x H100 SXM	80GB	$3.78	$3.78
8x H100 SXM	640GB	$27.52	$3.44

Eight H100 SXMs cost $27.52/hr total, or $3.44 per GPU. Lambda prices H100 SXM clusters at a slight discount vs the single-GPU rate of $3.78/hr.

Same caveat as A100: verify the actual billing before deploying. Networking overhead could show up in practice.

B200 Clusters

Lambda lists B200 SXM single GPU at $6.08/hr. No multi-GPU cluster pricing published yet as of March 2026. New product, limited availability. Expect cluster pricing to arrive after inventory stabilizes.

Reserved Instances and Discounts

Lambda's public pricing page does not clearly state reserved or commitment-based pricing as of March 2026.

Most cloud providers discount 20-50% on 1-year or 3-year commitments. Lambda likely offers something similar, but the exact terms are not documented in publicly available materials. Teams considering long-term workloads (training scheduled for next 6 months) should contact Lambda's sales team directly for reserved pricing options.

Estimated Monthly Costs with Hypothetical 30% Discount

On-demand monthly costs (730 hours continuous):

A100 SXM single: $1.48/hr = $1,080/month on-demand → ~$756/month reserved
H100 SXM single: $3.78/hr = $2,759/month on-demand → ~$1,931/month reserved
H100 SXM 8x cluster: $27.52/hr = $20,090/month on-demand → ~$14,063/month reserved
B200 SXM: $6.08/hr = $4,438/month on-demand → ~$3,107/month reserved

These are estimates assuming a 30% discount. Contact Lambda for actual reserved pricing.

Storage and Networking Costs

Storage

Lambda includes persistent storage in the hourly rate (typically 5-10GB base). Additional storage is metered:

SSD: approximately $0.10-$0.15 per GB-month
Archive: approximately $0.02 per GB-month

Example: 500GB SSD for dataset caching costs ~$50-75/month on top of GPU hourly rates.

Networking

Egress (data leaving Lambda): approximately standard AWS egress rates, $0.02-$0.10 per GB depending on destination. Inbound is typically free.

Example: downloading 100GB of training data = ~$2-10 in egress costs. Uploading 50GB of results = ~$1-5 in egress costs. Negligible for most workloads, material for large distributed training jobs.

Cost Estimation by Workload

Fine-Tuning a 7B Parameter Model (LoRA, 1x A100, 20 hours)

A100 PCIe at $1.48/hr × 20 hours = $29.60

Full fine-tuning (not LoRA) would require more compute and memory, pushing to H100 territory and roughly $57 to $76 for the same job.

Storage for 10GB dataset: ~$1/month, negligible for one-time project.

Total: ~$30-35

Inference Serving (1M tokens/day, ~2 hours GPU time/day)

A 7B model at 50 tokens/sec throughput. 1M tokens = 20,000 seconds = 5.56 hours/day.

Use GH200 at $1.99/hr for large context (96GB) or H100 PCIe at $2.86/hr for high throughput.

GH200: $1.99/hr × 5.56 hrs/day × 30 days = $331.62/month H100 PCIe: $2.86/hr × 5.56 hrs/day × 30 days = $475.42/month

Both are cost-effective for production inference. GH200 wins on per-token cost if context lengths exceed 100K tokens per request.

Training a 13B Parameter Model from Scratch (8x H100 SXM, 7 days)

Lambda 8x H100 SXM cluster: $27.52/hr × 168 hours = $4,623.36

Typical training time for 13B from scratch on 8x H100 is 5-9 days depending on data size and optimization. 7 days fits within the ballpark.

Storage for dataset: 100GB at $0.10-0.15/GB-month = $10-15/month (pro-rated for 1 week: ~$2-3)

Total: ~$3,350

Large Model Inference Serving (70B parameters, batch processing, 4 hours/day)

70B parameter model in 8-bit quantization requires ~70GB VRAM. H100 SXM has 80GB, fits exactly.

H100 SXM: $3.78/hr × 4 hrs/day × 30 days = $453.60/month

Throughput: ~40 tokens/sec per GPU on H100 for 70B model (batch size 1-2). 1M tokens/day needs 4+ hours of GPU time, can be served on a single H100 with overhead.

Training a 70B Parameter Model from Scratch (8x H100 SXM, 9 days)

Lambda 8x H100 SXM cluster: $27.52/hr × 216 hours (9 days) = $5,944.32

Storage for training data (500GB): ~$50-75/month, pro-rated ~$2-3/week

Total: ~$4,310

This is a realistic cost for training a 70B LLM from scratch. A research lab or company doing custom model development expects this scale of expense.

Comparison with Competitors

Lambda vs RunPod vs Vast.AI (best single-GPU rates as of March 2026):

Provider	A100	H100 PCIe	H100 SXM	Notes
Lambda	$1.48	$2.86	$3.78	Consistent pricing, strong support
RunPod	$1.19	$1.99	$2.69	Cheaper across most tiers
Vast.AI	[Variable]	[Variable]	[Variable]	Marketplace model, spot rates volatile

RunPod undercuts Lambda across most tiers. For example:

A100: Lambda $1.48 vs RunPod $1.19 = $0.29/hr difference = $212/month savings
H100 PCIe: Lambda $2.86 vs RunPod $1.99 = $0.87/hr difference = $635/month savings
H100 SXM: Lambda $3.78 vs RunPod $2.69 = RunPod is $1.09/hr cheaper = $796/month savings

At 8x H100 SXM scale:

Lambda 8x: $27.52/hr
RunPod 8x: $21.52/hr (8 × $2.69)

RunPod is meaningfully cheaper at the 8x cluster tier for H100 SXM. Lambda's advantages are managed infrastructure, guaranteed uptime, and NVLink cluster orchestration.

Lambda's edge: Consistency, support responsiveness, and UI/UX for job management. Worth paying the premium for production workloads where uptime SLAs and API reliability matter more than absolute lowest hourly rate.

Cost Optimization Strategies

1. Choose the Right Form Factor

Single vs multi-GPU decision impacts total cost:

Single-GPU workload (7B model inference):

Use H100 PCIe at $2.86/hr
Cannot use SXM (designed for multi-GPU)
Cost: $2.86/hr

If using SXM for a single-GPU workload, cost is $3.78/hr — more expensive than PCIe ($2.86/hr). For single-GPU workloads, H100 PCIe is the more cost-effective choice on Lambda. SXM's value is in multi-GPU NVLink bandwidth.

Multi-GPU workload (training large model):

Use H100 SXM for NVLink
H100 SXM 8x costs $27.52/hr = $3.44 per GPU (slight discount vs single $3.78/hr)
Running 8x PCIe separately is $2.86 × 8 = $22.88/hr
SXM cluster costs 20% more than 8x PCIe but provides substantially better bandwidth for distributed training

2. Right-Size the VRAM

Don't pay for more memory than needed:

7B model: A100 (40GB) is overkill, use A10 (24GB) at $0.86/hr
13B model: A100 (40GB) is sufficient, no need for GH200 (96GB)
70B model: H100 (80GB) fits with 8-bit quantization, no need for B200 (192GB)
140B model: B200 (192GB) required for full precision

Oversizing by 1 tier costs $1-3/hr. Over a month, that's $730-2,190 unnecessarily.

3. Batching and Throughput Optimization

Batch multiple inference requests together. Single request latency matters less than aggregate throughput for cost optimization.

Batch size 1: ~50 tokens/sec per A100
Batch size 8: ~350 tokens/sec per A100 (7x improvement)

Cost per million tokens: $1.48/hr × 3600 sec/hr / 350 tokens/sec = $15.19/M tokens

Same workload on batch size 1: $1.48/hr × 3600 / 50 = $106.56/M tokens

Batching reduces per-token cost by 7x. Infrastructure complexity is the trade-off.

4. Off-Peak Reserved Instances

If available (pending verification), commit to off-peak hours. Many cloud providers have lower rates during night hours. Lambda may offer similar discounts.

Example: if Lambda offers 50% discount for reservations, a 8x H100 SXM cluster at $27.52/hr becomes $13.76/hr reserved. Over a month of continuous training, savings approach $8,000.

5. Quantization and Inference Optimization

8-bit quantization reduces memory by 50%, allowing a 70B model to fit on a single A100 (40GB with 8-bit, 80GB full precision). Trade-off: ~10-20% slower throughput but dramatically lower cost per-inference.

Cost per-million tokens:

Full precision 70B on H100: $2.86/hr at 40 tokens/sec = $257/M
8-bit quantized 70B on A100: $1.48/hr at 35 tokens/sec = $152/M
Savings: 41% cheaper

FAQ

What is the cheapest GPU on Lambda Cloud? Quadro RTX 6000 at $0.58/hr. It's legacy hardware not suitable for AI. A10 at $0.86/hr is the cheapest practical option for modern AI workloads.

Which Lambda GPU is best for inference? GH200 at $1.99/hr for long-context models (96GB memory supports 70B+ parameters). H100 PCIe at $2.86/hr for throughput and general-purpose inference. Both are single-GPU, so total cost is per-GPU rate × hours. Inference rarely needs multi-GPU unless models exceed 120GB unquantized.

How much does it cost to run a small LLM inference service on Lambda? Roughly $120 to $175/month for 1-2 million tokens/day on a single GPU, depending on model size (7B uses A10 at cheaper rate) and batch latency requirements. Continuous 24/7 use at 2 hours/day GPU time.

Can Lambda GPUs be reserved to get discounts? Publicly available pricing does not list reserved rates. Contact Lambda sales for commitment-based pricing. Industry standard is 30-50% discount for 1-year or 3-year terms. Estimate $756/month for A100 with 30% discount vs $1,080 on-demand.

Why is H100 SXM more expensive than PCIe on Lambda? H100 SXM at $3.78/hr is priced above H100 PCIe at $2.86/hr on Lambda. The premium reflects the NVLink 4.0 interconnect (900 GB/s per GPU) and specialized SXM chassis required for high-bandwidth multi-GPU training. For single-GPU workloads, H100 PCIe is the better value. For multi-GPU distributed training, H100 SXM's bandwidth advantage justifies the premium.

What's the difference between A100 and H100 pricing? H100 is 2x the compute density of A100, costs roughly 2-2.5x more per hour. For training speed, H100 wins decisively (faster model convergence). For pure cost-per-FLOP on inference, older A100 can be competitive if model latency requirements are relaxed. Depends on specific workload.

How many GPUs do I need for my workload? 7B parameters: 1x GPU (A10 or A100 depending on batching) 13B-34B parameters: 1x A100 or 1x GH200 70B parameters: 1x H100 with 8-bit quantization, or 1x B200 full precision 140B+ parameters: 2x H100 or 1x B200 with model parallelism

Scaling to 8 GPUs for training speeds up data parallel training by ~7-8x (accounting for overhead), not exactly 8x.

Is it cheaper to buy GPUs than rent on Lambda? Only if running continuously for 12+ months. H100 PCIe hardware costs $15,000-$20,000. Lambda at $2.86/hr over 24 months continuous = ~$50,000. But most teams don't run 24/7. At typical use (8 hours/day), the breakeven is ~3 years. Lambda wins for flexibility and no maintenance.

Sources

Lambda Cloud Pricing
NVIDIA H100 Datasheet
NVIDIA A100 Datasheet
NVIDIA GH200 Product Brief
NVIDIA B200 Tensor Core GPU
DeployBase GPU Pricing Tracker (Lambda rates observed March 21, 2026)

Contents