Contents
- L40S GPU Specifications
- RunPod L40S Pricing
- How to Rent L40S on RunPod
- Performance Characteristics
- Comparison With Other Providers
- FAQ
- Related Resources
- Sources
L40S GPU Specifications
The NVIDIA L40S runs graphics and AI workloads: 48GB GDDR6, 18,176 CUDA cores, optimized for rendering and deep learning.
L40S RunPod pricing starts at $0.79/hour. Solid economics for inference. Specs:
- Memory: 48GB GDDR6
- Memory Bandwidth: 864 GB/s
- CUDA Cores: 18,176
- Peak FP32 Performance: 91.6 TFLOPS (TF32: 366 TFLOPS)
- Max Power Consumption: 350W
- Tensor Cores: 568
Good for inference serving, image generation, lightweight training. Can handle models up to 30B parameters.
For detailed specifications, see the L40S specs guide.
RunPod L40S Pricing
RunPod offers L40S access at highly competitive rates. As of March 2026, the L40S runs $0.79 per hour on shared instances, or $1.10+ per hour for dedicated machines.
Pricing structure:
- Shared Instance: $0.79/hour
- Dedicated Instance: $1.10-1.50/hour
- Monthly Commitment: 25-30% discount
Pay only for running time. Stop an instance, pay nothing. Free network egress within RunPod. Network storage is $0.01/GB/month.
No setup fees. Launch in under 60 seconds.
Compare rates with Lambda GPU pricing and CoreWeave GPU pricing for full market context.
How to Rent L40S on RunPod
Starting on RunPod takes minimal steps:
- Visit RunPod.io and create an account
- Select "Pods" from the menu
- Click "Create New Pod"
- Search for "L40S" in the GPU selector
- Choose shared or dedicated instance
- Select a template (PyTorch, Jupyter, etc.)
- Configure storage allocation
- Launch the pod
Instances spin up instantly. SSH works within seconds. Pick PyTorch for ML training or Jupyter Lab for notebooks.
Map ports for Jupyter (8888), FastAPI (8000), or whatever developers need.
Common workflow: Start with shared, move to dedicated for consistency. Use snapshots for reproducibility. Deploy inference servers for production.
For more provider options, explore AWS GPU pricing and Paperspace GPU pricing.
Performance Characteristics
L40S performance varies by workload. Single L40S handles 150-300 tokens/sec on 7B models (batch size dependent).
Inference benchmarks:
- Llama 2 7B (batch 1): 250-300 tokens/sec
- Llama 2 13B (batch 1): 180-220 tokens/sec
- Stable Diffusion (768x768): 2-3 images/sec
Training performance:
- 7B model fine-tuning: 400-600 tokens/sec
- Gradient checkpointing enabled: 300-450 tokens/sec
Can load Llama 2 70B with 4-bit quantization, but generation slows down.
Comparison With Other Providers
| Provider | L40S Price | Setup | Availability |
|---|---|---|---|
| RunPod | $0.79/hr | Instant | Global |
| Lambda | $0.92/hr | Instant | US, Europe |
| Paperspace | $1.20-1.80/hr | 5 min | Global |
| AWS | $1.50+/hr | 10 min | Global |
| Crusoe | $1.00/hr | 5 min | US-based |
RunPod wins on price. Shared instances work great for dev and prototyping. Dedicated instances cost more but give predictable performance.
FAQ
Is the $0.79/hr rate available globally? Yes. RunPod charges the same rates regardless of region.
Can I save models permanently on RunPod? Yes, through network storage ($0.01/GB/month). Models persist across pod restarts.
What is the maximum storage I can attach? RunPod supports up to 1TB of network storage per pod.
Does RunPod offer multi-GPU L40S clusters? Yes, users can create pods with 2 or more L40S GPUs.
Are spot instances available for L40S? RunPod uses a stable instance model without spot interruptions.
Related Resources
Sources
- NVIDIA L40S Tensor GPU Datasheet
- RunPod Pricing & Pod Documentation (official)
- NVIDIA CUDA Toolkit Specifications