Contents
- Lambda Cloud GPU Pricing Overview
- Pricing Comparison Table
- Single-GPU Pricing Breakdown
- Multi-GPU Cluster Pricing
- Reserved Instances and Discounts
- Storage and Networking Costs
- Cost Estimation by Workload
- Comparison with Competitors
- Cost Optimization Strategies
- FAQ
- Related Resources
- Sources
Lambda Cloud GPU Pricing Overview
Lambda Cloud GPU pricing ranges from $0.58 per GPU-hour to $6.08 per GPU-hour on-demand as of March 2026. The spread depends on GPU model, cluster size, and VRAM. A Quadro RTX 6000 starts at the low end. B200 SXM at the high end. Most teams running inference land between $1.48 and $2.86 per hour.
Lambda positions itself in the mid-market tier. Not the cheapest boutique provider. Not an expensive hyperscaler. Consistent uptime, documented SLAs, and production-grade API access matter more than squeezing out absolute lowest hourly rates. Pricing reflects that positioning.
Full GPU inventory tracked on the DeployBase GPU comparison.
Pricing Comparison Table
| GPU Model | VRAM | Price/GPU-hr | Monthly (730 hrs) | Use Case |
|---|---|---|---|---|
| Quadro RTX 6000 | 24GB | $0.58 | $423.40 | Legacy professional graphics |
| NVIDIA A10 | 24GB | $0.86 | $627.80 | Small inference, LoRA fine-tuning |
| RTX A6000 | 48GB | $0.92 | $671.60 | Rendering, visualization, fine-tuning |
| A100 PCIe | 40GB | $1.48 | $1,080.40 | Mid-tier training, inference |
| A100 SXM | 40GB | $1.48 | $1,080.40 | Distributed training (multi-GPU) |
| GH200 | 96GB | $1.99 | $1,452.70 | Large model inference, long context |
| H100 PCIe | 80GB | $2.86 | $2,087.80 | High-throughput inference, single GPU |
| H100 SXM | 80GB | $3.78 | $2,759.40 | Large model training, multi-GPU clusters |
| B200 SXM | 192GB | $6.08 | $4,438.40 | Frontier model training, maximum memory |
All prices from Lambda's official pricing page, observed March 21, 2026.
Single-GPU Pricing Breakdown
Budget Tier ($0.58-$0.92/hr)
The Quadro RTX 6000 starts at $0.58/hr. Legacy GPU with 24GB GDDR6 memory. Designed for professional visualization and CAD workloads, not deep learning. Rarely the right choice for AI compute, but it's the cheapest entry point on Lambda. Infrastructure is outdated, which explains the low cost.
More practical: the A10 at $0.86/hr. 24GB GDDR6, respectable for small fine-tuning jobs and inference at single-digit batch sizes. Training from scratch at this tier is slow (140 peak TFLOPS vs 660 for H100). Inference with smaller models (7B parameters or less) works fine. Throughput is ~20-30 tokens/sec for a 7B model.
RTX A6000 at $0.92/hr adds 48GB for $0.06 more per hour. Same memory technology as A10 (GDDR6 not HBM), but double the VRAM. Opens space for slightly larger models or higher batch sizes without model parallelism. Training on this tier is still slow for anything above 7B parameters.
Mid-Tier ($1.48-$1.99/hr)
The A100 PCIe and A100 SXM both sit at $1.48/hr on Lambda. 40GB HBM2e memory. Same hourly cost, different form factors.
PCIe variant: Uses standard memory bus (2.0 TB/s bandwidth). Single-GPU inference is fast enough. Multi-GPU training hits bandwidth limits. Good for single-GPU deployments and small multi-GPU clusters where PCIe is a bottleneck teams can tolerate.
SXM variant: Uses NVLink-ready chassis but Lambda prices them identically. If building a multi-GPU cluster, SXM is the right choice. The form factor is designed for rack deployment with high-speed interconnects. For a single A100 SXM, cost-wise, PCIe is equivalent.
The GH200 at $1.99/hr jumps to 96GB HBM3e memory. That 2.4x increase in VRAM for 34% more cost per hour is a practical trade-off. Large language models in the 13B-70B parameter range fit comfortably without quantization. Good for inference serving on large models, worse for training due to lower peak compute relative to H100 series (540 peak TFLOPS vs 660 for H100).
High Performance ($2.86-$3.78/hr)
H100 PCIe at $2.86/hr. 80GB HBM2e, 2.0 TB/s bandwidth, 350W TDP. Most common H100 variant on cloud providers because it fits standard server builds. Inference throughput is excellent (~100-150 tokens/sec for a 70B model). Multi-GPU training hits memory bandwidth limits when scaling beyond 2-4 GPUs. Single-GPU training is viable for models up to 70B parameters with some quantization.
H100 SXM at $3.78/hr is the distributed training workhorse. Same 80GB VRAM, but 3.35 TB/s memory bandwidth and NVLink 4.0 interconnect (900 GB/s per GPU). Eight-GPU clusters reach 57.6 TB/s aggregate bandwidth. Essential for training 70B+ models or large model parallelism. The SXM premium ($0.92/hr more than PCIe) reflects the NVLink interconnect advantage for multi-GPU deployments.
Real-world scenario: training a 70B parameter LLM on 8x H100 SXM takes 7-10 days depending on data size. Training the same model on 8x H100 PCIe would bottleneck on memory bandwidth and take 25-40% longer.
Frontier ($6.08/hr)
B200 SXM at $6.08/hr. 192GB HBM3e memory. NVIDIA's newest generation (late 2024). Single GPU holds models up to 140B parameters without quantization. Training at this scale was previously impossible without multi-GPU parallelism and associated complexity. Lambda offers it on the highest-spec SXM chassis. Limited availability. High cost. But if a team needs single-GPU training for very large models, it's the option.
Monthly cost at continuous use (730 hours): $4,438.40. Only justifiable for specialized workloads with specific hardware requirements. Typical customer is a research org or large company training custom models.
Multi-GPU Cluster Pricing
Lambda's multi-GPU pricing is listed for A100, H100, and B200. Each configuration includes all required networking and NVLink infrastructure.
A100 Clusters
| Configuration | VRAM Total | Price/hr | Price/GPU |
|---|---|---|---|
| 1x A100 PCIe | 40GB | $1.48 | $1.48 |
| 2x A100 PCIe | 80GB | $2.96 | $1.48 |
| 4x A100 PCIe | 160GB | $5.92 | $1.48 |
| 1x A100 SXM | 40GB | $1.48 | $1.48 |
| 8x A100 SXM | 320GB | $11.84 | $1.48 |
| 8x A100 SXM | 640GB | $16.48 | $2.06 |
The 8x A100 SXM cluster at 320GB costs $11.84/hr ($1.48 per GPU). That's exactly the per-GPU rate of a single 1x. Most providers charge 10-15% more for clustered GPUs due to networking, chassis, and NVLink overhead. Lambda's pricing suggests the bulk rate absorbs overhead, making the 8-GPU cluster cost-neutral on a per-GPU basis compared to spinning up eight 1x instances separately.
The 8x A100 SXM at 640GB is a different configuration (likely higher memory variant or additional networking), costs $16.48/hr ($2.06 per GPU). More expensive per-GPU because it's a specialized cluster configuration.
H100 Clusters
| Configuration | VRAM Total | Price/hr | Price/GPU |
|---|---|---|---|
| 1x H100 SXM | 80GB | $3.78 | $3.78 |
| 8x H100 SXM | 640GB | $27.52 | $3.44 |
Eight H100 SXMs cost $27.52/hr total, or $3.44 per GPU. Lambda prices H100 SXM clusters at a slight discount vs the single-GPU rate of $3.78/hr.
Same caveat as A100: verify the actual billing before deploying. Networking overhead could show up in practice.
B200 Clusters
Lambda lists B200 SXM single GPU at $6.08/hr. No multi-GPU cluster pricing published yet as of March 2026. New product, limited availability. Expect cluster pricing to arrive after inventory stabilizes.
Reserved Instances and Discounts
Lambda's public pricing page does not clearly state reserved or commitment-based pricing as of March 2026.
Most cloud providers discount 20-50% on 1-year or 3-year commitments. Lambda likely offers something similar, but the exact terms are not documented in publicly available materials. Teams considering long-term workloads (training scheduled for next 6 months) should contact Lambda's sales team directly for reserved pricing options.
Estimated Monthly Costs with Hypothetical 30% Discount
On-demand monthly costs (730 hours continuous):
- A100 SXM single: $1.48/hr = $1,080/month on-demand → ~$756/month reserved
- H100 SXM single: $3.78/hr = $2,759/month on-demand → ~$1,931/month reserved
- H100 SXM 8x cluster: $27.52/hr = $20,090/month on-demand → ~$14,063/month reserved
- B200 SXM: $6.08/hr = $4,438/month on-demand → ~$3,107/month reserved
These are estimates assuming a 30% discount. Contact Lambda for actual reserved pricing.
Storage and Networking Costs
Storage
Lambda includes persistent storage in the hourly rate (typically 5-10GB base). Additional storage is metered:
- SSD: approximately $0.10-$0.15 per GB-month
- Archive: approximately $0.02 per GB-month
Example: 500GB SSD for dataset caching costs ~$50-75/month on top of GPU hourly rates.
Networking
Egress (data leaving Lambda): approximately standard AWS egress rates, $0.02-$0.10 per GB depending on destination. Inbound is typically free.
Example: downloading 100GB of training data = ~$2-10 in egress costs. Uploading 50GB of results = ~$1-5 in egress costs. Negligible for most workloads, material for large distributed training jobs.
Cost Estimation by Workload
Fine-Tuning a 7B Parameter Model (LoRA, 1x A100, 20 hours)
A100 PCIe at $1.48/hr × 20 hours = $29.60
Full fine-tuning (not LoRA) would require more compute and memory, pushing to H100 territory and roughly $57 to $76 for the same job.
Storage for 10GB dataset: ~$1/month, negligible for one-time project.
Total: ~$30-35
Inference Serving (1M tokens/day, ~2 hours GPU time/day)
A 7B model at 50 tokens/sec throughput. 1M tokens = 20,000 seconds = 5.56 hours/day.
Use GH200 at $1.99/hr for large context (96GB) or H100 PCIe at $2.86/hr for high throughput.
GH200: $1.99/hr × 5.56 hrs/day × 30 days = $331.62/month H100 PCIe: $2.86/hr × 5.56 hrs/day × 30 days = $475.42/month
Both are cost-effective for production inference. GH200 wins on per-token cost if context lengths exceed 100K tokens per request.
Training a 13B Parameter Model from Scratch (8x H100 SXM, 7 days)
Lambda 8x H100 SXM cluster: $27.52/hr × 168 hours = $4,623.36
Typical training time for 13B from scratch on 8x H100 is 5-9 days depending on data size and optimization. 7 days fits within the ballpark.
Storage for dataset: 100GB at $0.10-0.15/GB-month = $10-15/month (pro-rated for 1 week: ~$2-3)
Total: ~$3,350
Large Model Inference Serving (70B parameters, batch processing, 4 hours/day)
70B parameter model in 8-bit quantization requires ~70GB VRAM. H100 SXM has 80GB, fits exactly.
H100 SXM: $3.78/hr × 4 hrs/day × 30 days = $453.60/month
Throughput: ~40 tokens/sec per GPU on H100 for 70B model (batch size 1-2). 1M tokens/day needs 4+ hours of GPU time, can be served on a single H100 with overhead.
Training a 70B Parameter Model from Scratch (8x H100 SXM, 9 days)
Lambda 8x H100 SXM cluster: $27.52/hr × 216 hours (9 days) = $5,944.32
Storage for training data (500GB): ~$50-75/month, pro-rated ~$2-3/week
Total: ~$4,310
This is a realistic cost for training a 70B LLM from scratch. A research lab or company doing custom model development expects this scale of expense.
Comparison with Competitors
Lambda vs RunPod vs Vast.AI (best single-GPU rates as of March 2026):
| Provider | A100 | H100 PCIe | H100 SXM | Notes |
|---|---|---|---|---|
| Lambda | $1.48 | $2.86 | $3.78 | Consistent pricing, strong support |
| RunPod | $1.19 | $1.99 | $2.69 | Cheaper across most tiers |
| Vast.AI | [Variable] | [Variable] | [Variable] | Marketplace model, spot rates volatile |
RunPod undercuts Lambda across most tiers. For example:
- A100: Lambda $1.48 vs RunPod $1.19 = $0.29/hr difference = $212/month savings
- H100 PCIe: Lambda $2.86 vs RunPod $1.99 = $0.87/hr difference = $635/month savings
- H100 SXM: Lambda $3.78 vs RunPod $2.69 = RunPod is $1.09/hr cheaper = $796/month savings
At 8x H100 SXM scale:
- Lambda 8x: $27.52/hr
- RunPod 8x: $21.52/hr (8 × $2.69)
RunPod is meaningfully cheaper at the 8x cluster tier for H100 SXM. Lambda's advantages are managed infrastructure, guaranteed uptime, and NVLink cluster orchestration.
Lambda's edge: Consistency, support responsiveness, and UI/UX for job management. Worth paying the premium for production workloads where uptime SLAs and API reliability matter more than absolute lowest hourly rate.
Cost Optimization Strategies
1. Choose the Right Form Factor
Single vs multi-GPU decision impacts total cost:
Single-GPU workload (7B model inference):
- Use H100 PCIe at $2.86/hr
- Cannot use SXM (designed for multi-GPU)
- Cost: $2.86/hr
If using SXM for a single-GPU workload, cost is $3.78/hr — more expensive than PCIe ($2.86/hr). For single-GPU workloads, H100 PCIe is the more cost-effective choice on Lambda. SXM's value is in multi-GPU NVLink bandwidth.
Multi-GPU workload (training large model):
- Use H100 SXM for NVLink
- H100 SXM 8x costs $27.52/hr = $3.44 per GPU (slight discount vs single $3.78/hr)
- Running 8x PCIe separately is $2.86 × 8 = $22.88/hr
- SXM cluster costs 20% more than 8x PCIe but provides substantially better bandwidth for distributed training
2. Right-Size the VRAM
Don't pay for more memory than needed:
- 7B model: A100 (40GB) is overkill, use A10 (24GB) at $0.86/hr
- 13B model: A100 (40GB) is sufficient, no need for GH200 (96GB)
- 70B model: H100 (80GB) fits with 8-bit quantization, no need for B200 (192GB)
- 140B model: B200 (192GB) required for full precision
Oversizing by 1 tier costs $1-3/hr. Over a month, that's $730-2,190 unnecessarily.
3. Batching and Throughput Optimization
Batch multiple inference requests together. Single request latency matters less than aggregate throughput for cost optimization.
- Batch size 1: ~50 tokens/sec per A100
- Batch size 8: ~350 tokens/sec per A100 (7x improvement)
Cost per million tokens: $1.48/hr × 3600 sec/hr / 350 tokens/sec = $15.19/M tokens
Same workload on batch size 1: $1.48/hr × 3600 / 50 = $106.56/M tokens
Batching reduces per-token cost by 7x. Infrastructure complexity is the trade-off.
4. Off-Peak Reserved Instances
If available (pending verification), commit to off-peak hours. Many cloud providers have lower rates during night hours. Lambda may offer similar discounts.
Example: if Lambda offers 50% discount for reservations, a 8x H100 SXM cluster at $27.52/hr becomes $13.76/hr reserved. Over a month of continuous training, savings approach $8,000.
5. Quantization and Inference Optimization
8-bit quantization reduces memory by 50%, allowing a 70B model to fit on a single A100 (40GB with 8-bit, 80GB full precision). Trade-off: ~10-20% slower throughput but dramatically lower cost per-inference.
Cost per-million tokens:
- Full precision 70B on H100: $2.86/hr at 40 tokens/sec = $257/M
- 8-bit quantized 70B on A100: $1.48/hr at 35 tokens/sec = $152/M
- Savings: 41% cheaper
FAQ
What is the cheapest GPU on Lambda Cloud? Quadro RTX 6000 at $0.58/hr. It's legacy hardware not suitable for AI. A10 at $0.86/hr is the cheapest practical option for modern AI workloads.
Which Lambda GPU is best for inference? GH200 at $1.99/hr for long-context models (96GB memory supports 70B+ parameters). H100 PCIe at $2.86/hr for throughput and general-purpose inference. Both are single-GPU, so total cost is per-GPU rate × hours. Inference rarely needs multi-GPU unless models exceed 120GB unquantized.
How much does it cost to run a small LLM inference service on Lambda? Roughly $120 to $175/month for 1-2 million tokens/day on a single GPU, depending on model size (7B uses A10 at cheaper rate) and batch latency requirements. Continuous 24/7 use at 2 hours/day GPU time.
Can Lambda GPUs be reserved to get discounts? Publicly available pricing does not list reserved rates. Contact Lambda sales for commitment-based pricing. Industry standard is 30-50% discount for 1-year or 3-year terms. Estimate $756/month for A100 with 30% discount vs $1,080 on-demand.
Why is H100 SXM more expensive than PCIe on Lambda? H100 SXM at $3.78/hr is priced above H100 PCIe at $2.86/hr on Lambda. The premium reflects the NVLink 4.0 interconnect (900 GB/s per GPU) and specialized SXM chassis required for high-bandwidth multi-GPU training. For single-GPU workloads, H100 PCIe is the better value. For multi-GPU distributed training, H100 SXM's bandwidth advantage justifies the premium.
What's the difference between A100 and H100 pricing? H100 is 2x the compute density of A100, costs roughly 2-2.5x more per hour. For training speed, H100 wins decisively (faster model convergence). For pure cost-per-FLOP on inference, older A100 can be competitive if model latency requirements are relaxed. Depends on specific workload.
How many GPUs do I need for my workload? 7B parameters: 1x GPU (A10 or A100 depending on batching) 13B-34B parameters: 1x A100 or 1x GH200 70B parameters: 1x H100 with 8-bit quantization, or 1x B200 full precision 140B+ parameters: 2x H100 or 1x B200 with model parallelism
Scaling to 8 GPUs for training speeds up data parallel training by ~7-8x (accounting for overhead), not exactly 8x.
Is it cheaper to buy GPUs than rent on Lambda? Only if running continuously for 12+ months. H100 PCIe hardware costs $15,000-$20,000. Lambda at $2.86/hr over 24 months continuous = ~$50,000. But most teams don't run 24/7. At typical use (8 hours/day), the breakeven is ~3 years. Lambda wins for flexibility and no maintenance.
Related Resources
Sources
- Lambda Cloud Pricing
- NVIDIA H100 Datasheet
- NVIDIA A100 Datasheet
- NVIDIA GH200 Product Brief
- NVIDIA B200 Tensor Core GPU
- DeployBase GPU Pricing Tracker (Lambda rates observed March 21, 2026)