Contents
- Overview
- L40S GPU Specifications
- Lambda Labs L40S Pricing
- How to Rent L40S on Lambda Labs
- Performance Benchmarks
- Lambda Labs vs Other Providers
- FAQ
- Related Resources
- Sources
Overview
L40S on Lambda Labs would be solid for inference and computer vision, but Lambda Labs doesn't currently advertise L40S (as of March 2026). They're pushing A100, H100, and GH200 instead. Still, worth knowing the specs, alternatives, and deployment tradeoffs if developers're evaluating this GPU tier.
L40S GPU Specifications
The NVIDIA L40S handles inference and data center workloads well:
- Memory: 48GB GDDR6
- Memory Bandwidth: 864 GB/s
- Compute Capability: SM89
- Peak Tensor Performance (TF32): 366 TFLOPS (733 TFLOPS FP16)
- Peak FP32 Performance: 91.6 TFLOPS
- Power Consumption: 350W
- Interconnect: PCIe 4.0 only (no NVLink)
The L40S excels at:
- Image inference (ResNet, EfficientNet, Vision Transformers)
- Video processing and encoding
- Stable Diffusion and diffusion model inference
- Object detection (YOLO, Faster R-CNN)
- Natural language processing at 7B-13B scales
Lambda Labs L40S Pricing
Lambda Labs no longer advertises L40S pricing publicly. However, comparable alternatives available:
- Lambda Labs A10 (similar tier): $0.86/hour
- Lambda Labs A6000 (similar memory): $0.92/hour
- Lambda Labs RTX A6000 (48GB memory match): $0.92/hour
For L40S rental elsewhere:
- RunPod: L40S available at ~$0.79/hour
- CoreWeave: Available in 8-GPU cluster at ~$2.25/GPU/hour
- AWS EC2: Available on g6e instances at ~$1.50-2.00/GPU/hour
- Azure: Not currently advertised
Lambda Labs pivoted to H100/GH200. L40S became less available on their platform.
How to Rent L40S on Lambda Labs
Option 1: Contact Lambda Labs Sales
L40S may be available through direct sales engagement:
- Visit Lambda Labs website
- Click "Request H100" or contact sales
- Specify L40S requirements and budget
- Receive custom pricing based on volume and duration
Option 2: Use Alternative Providers
Since Lambda Labs doesn't advertise L40S, consider:
- RunPod: Check GPU marketplace for L40 or similar
- Paperspace: Offers professional GPU tiers
- Modal: Cloud platform with diverse GPU options
- Amazon SageMaker: Pay-as-developers-go inference endpoints
Step 1: Account Setup
Create account, verify email, add payment method. Takes 5 minutes.
Step 2: Select Hardware
Browse instances. No L40S? Filter for:
- 48GB+ memory
- Inference workloads
- Under $1.00/hour
Step 3: Deploy Instance
Select desired instance and configure:
- Ubuntu 22.04 or 24.04 OS
- CUDA 12.x runtime
- Python 3.10+
- SSH key for secure access
Step 4: Install ML Stack
Common stacks:
- PyTorch or TensorFlow for training
- vLLM for serving LLMs fast
- TensorRT for optimized inference
- Hugging Face transformers for model loading
Performance Benchmarks
L40S performance varies by workload:
Stable Diffusion Inference
- Model: SDXL 1.0
- Batch size: 4
- Throughput: 8-12 images/minute
- Latency: 5-8 seconds per image
Llama 2 7B Inference
- Batch size: 32
- Throughput: 800-1,200 tokens/second
- Latency (p50): 15-20ms
- Memory utilization: 85%
Vision Transformer Classification
- ImageNet 1K classification
- Batch size: 256
- Throughput: 3,200 images/second
- Latency: 80ms
Lambda Labs vs Other Providers
Comparison for inference workloads with similar memory:
| Provider | GPU | Price/hr | Memory | Availability |
|---|---|---|---|---|
| Lambda Labs | A6000 | $0.92 | 48GB | Available |
| Lambda Labs | A100 | $1.48 | 80GB | Available |
| RunPod | L40 | $0.69 | 48GB | High |
| CoreWeave | L40S bundle | $18 (8-pack) | 384GB | Available |
Lambda Labs A6000 offers the closest direct replacement. RunPod L40 provides lower cost for single GPU rental. For computer vision applications, L40S balances cost and performance effectively.
FAQ
Why did Lambda Labs discontinue L40S availability? As demand for large language models surged, Lambda Labs prioritized H100 and GH200 capacity. L40S remains a solid inference GPU, but market demand shifted to larger models requiring more memory.
Can I use L40S for model training? Yes, but suboptimally. L40S works for training models under 13B parameters. However, H100 or A100 GPUs provide better training performance. L40S is primarily designed for inference.
What's the total cost to run a Stable Diffusion service on L40S for one month? At $0.92/hour for comparable A6000 (RunPod L40: $0.69/hour), monthly cost is $503-666 for 24/7 operation (730 hours).
Is L40S compatible with CUDA and PyTorch? Yes. Full CUDA 12.x compatibility and PyTorch 2.x support exist. Code portability is excellent across inference platforms.
Which provider offers the best L40S alternative right now? RunPod's L40 ($0.69/hour) matches L40S memory at lower cost. For pure L40S, contact Lambda Labs sales or check smaller providers like Paperspace.
Related Resources
- L40S GPU Specifications
- Lambda Labs GPU Pricing Guide
- Best GPU Cloud for Computer Vision
- RunPod GPU Pricing Comparison
- Inference Optimization Guide