L40S on RunPod: Pricing, Specs & How to Rent

L40S GPU Specifications
RunPod L40S Pricing
How to Rent L40S on RunPod
Performance Characteristics
Comparison With Other Providers
FAQ
Related Resources
Sources

L40S GPU Specifications

The NVIDIA L40S runs graphics and AI workloads: 48GB GDDR6, 18,176 CUDA cores, optimized for rendering and deep learning.

L40S RunPod pricing starts at $0.79/hour. Solid economics for inference. Specs:

Memory: 48GB GDDR6
Memory Bandwidth: 864 GB/s
CUDA Cores: 18,176
Peak FP32 Performance: 91.6 TFLOPS (TF32: 366 TFLOPS)
Max Power Consumption: 350W
Tensor Cores: 568

Good for inference serving, image generation, lightweight training. Can handle models up to 30B parameters.

For detailed specifications, see the L40S specs guide.

RunPod L40S Pricing

RunPod offers L40S access at highly competitive rates. As of March 2026, the L40S runs $0.79 per hour on shared instances, or $1.10+ per hour for dedicated machines.

Pricing structure:

Shared Instance: $0.79/hour
Dedicated Instance: $1.10-1.50/hour
Monthly Commitment: 25-30% discount

Pay only for running time. Stop an instance, pay nothing. Free network egress within RunPod. Network storage is $0.01/GB/month.

No setup fees. Launch in under 60 seconds.

Compare rates with Lambda GPU pricing and CoreWeave GPU pricing for full market context.

How to Rent L40S on RunPod

Starting on RunPod takes minimal steps:

Visit RunPod.io and create an account
Select "Pods" from the menu
Click "Create New Pod"
Search for "L40S" in the GPU selector
Choose shared or dedicated instance
Select a template (PyTorch, Jupyter, etc.)
Configure storage allocation
Launch the pod

Instances spin up instantly. SSH works within seconds. Pick PyTorch for ML training or Jupyter Lab for notebooks.

Map ports for Jupyter (8888), FastAPI (8000), or whatever you need.

Common workflow: Start with shared, move to dedicated for consistency. Use snapshots for reproducibility. Deploy inference servers for production.

For more provider options, explore AWS GPU pricing and Paperspace GPU pricing.

Performance Characteristics

L40S performance varies by workload. Single L40S handles 150-300 tokens/sec on 7B models (batch size dependent).

Inference benchmarks:

Llama 2 7B (batch 1): 250-300 tokens/sec
Llama 2 13B (batch 1): 180-220 tokens/sec
Stable Diffusion (768x768): 2-3 images/sec

Training performance:

7B model fine-tuning: 400-600 tokens/sec
Gradient checkpointing enabled: 300-450 tokens/sec

Can load Llama 2 70B with 4-bit quantization, but generation slows down.

Comparison With Other Providers

Provider	L40S Price	Setup	Availability
RunPod	$0.79/hr	Instant	Global
Lambda	$0.92/hr	Instant	US, Europe
Paperspace	$1.20-1.80/hr	5 min	Global
AWS	$1.50+/hr	10 min	Global
Crusoe	$1.00/hr	5 min	US-based

RunPod wins on price. Shared instances work great for dev and prototyping. Dedicated instances cost more but give predictable performance.

FAQ

Is the $0.79/hr rate available globally? Yes. RunPod charges the same rates regardless of region.

Can I save models permanently on RunPod? Yes, through network storage ($0.01/GB/month). Models persist across pod restarts.

What is the maximum storage I can attach? RunPod supports up to 1TB of network storage per pod.

Does RunPod offer multi-GPU L40S clusters? Yes, users can create pods with 2 or more L40S GPUs.

Are spot instances available for L40S? RunPod uses a stable instance model without spot interruptions.

Sources

NVIDIA L40S Tensor GPU Datasheet
RunPod Pricing & Pod Documentation (official)
NVIDIA CUDA Toolkit Specifications

Contents