Contents
- L40S Technical Specifications
- Vast.ai L40S Pricing
- Performance Benchmarks
- How to Rent L40S on Vast.ai
- FAQ
- Related Resources
- Sources
L40S Technical Specifications
L40S on Vast.AI pricing works well for inference, fine-tuning, and smaller training jobs. The L40S has 48 GB GDDR6 memory, 91.6 TFLOPS FP32 performance, and tensor ops for matrices. It's the efficient mid-range option for production AI work.
Hardware specifications:
- Memory: 48 GB GDDR6
- Memory bandwidth: 864 GB/s
- CUDA cores: 18,176
- FP32 performance: 91.6 TFLOPS
- Peak FP8 throughput: 1,466 TFLOPS (with sparsity)
- Power consumption: 350W
- Form factor: Single-slot PCIe Gen 4
Vast.AI L40S Pricing
Vast.ai operates as a peer-to-peer GPU marketplace, connecting users with GPU holders. L40S pricing on the platform typically ranges from $0.35 to $0.50 per hour, significantly undercutting traditional providers. For comparison, RunPod's L40S costs $0.79/hour, while CoreWeave's 8xL40S bundle runs $18/hour total ($2.25 per GPU).
Vast.AI pricing factors:
- Base rental rate: $0.35-0.50/hour per L40S GPU
- Platform fee: 15% on top of provider rate
- Bandwidth charges: $0.05 per GB outbound data
- No minimum rental period (pay per minute)
- Instant refunds for disrupted rentals (uptime guarantee)
Expect price swings. Peak hours hit $0.60-0.80/hr as supply tightens. Off-peak drops to $0.25-0.35. Flexible workloads win.
Performance Benchmarks
L40S handles 7B models at 120-150 tokens/sec. H100 hits 200+ but costs way more. L40S works for latency-sensitive apps that don't need massive throughput.
L40S benchmarks:
- 7B LLM inference: 120-150 tokens/sec
- 13B LLM inference: 60-80 tokens/sec
- Image generation (Stable Diffusion): 8-12 images/minute
- Batch inference (video analysis): 15-20 FPS
- Fine-tuning 7B models: Effective throughput 2,000-3,000 tokens/sec
The L40S is particularly strong for:
- Real-time inference with sub-100ms latency requirements
- Multi-model serving on a single GPU
- Computer vision inference at scale
- Fine-tuning consumer-facing models
How to Rent L40S on Vast.AI
Vast.AI rental is straightforward:
- Create Vast.AI account and verify email
- Add payment method (credit card or crypto)
- Click Create Instance in dashboard
- Filter by GPU type: Search "L40S"
- Sort by price (low to high) or reputation
- Select a provider with 95%+ uptime rating
- Choose machine specifications (vCPU, RAM, disk)
- Set rental duration or leave open-ended
- Enter SSH public key
- Click Rent
Provisioning takes 30-120 seconds. Developers get IP and SSH creds instantly. Docker works natively. NVIDIA drivers and CUDA preinstalled.
FAQ
Why is Vast.AI L40S cheaper than RunPod or Lambda? Vast.AI is peer-to-peer: individuals and small datacenters rent spare capacity. This eliminates middleman margins. Users trade SLA guarantees and guaranteed uptime for lower prices. Disruptions occur in 1-2% of rentals on average.
What happens if my L40S rental gets disrupted? Vast.AI refunds the full rental amount pro-rata. Disruptions are rare (uptime targets 98-99% on highly-rated providers), but users should backup checkpoints every 30 minutes for production training.
Can I run multiple workloads on a single L40S? Yes. L40S supports MPS (Multi-Process Service) mode, allowing 2-3 concurrent processes. Performance degrades 20-30% per additional process. For production multi-tenant serving, dedicated instances are recommended.
Does Vast.AI support persistent storage? Each rental gets temporary NVMe storage (size varies by provider). Persistent storage isn't offered. Users should upload training data before rental starts and download results before instance terminates.
Which regions does Vast.AI serve? Vast.AI providers span North America, Europe, and Asia. Performance varies by location. U.S.-based providers typically have 5-15ms latency to AWS regions and 20-50ms to international datacenters.
Related Resources
Explore alternative GPU rental options: