L40S on Vast.AI: Pricing, Specs & How to Rent

L40S Technical Specifications
Vast.ai L40S Pricing
Performance Benchmarks
How to Rent L40S on Vast.ai
FAQ
Related Resources
Sources

L40S Technical Specifications

L40S on Vast.AI pricing works well for inference, fine-tuning, and smaller training jobs. The L40S has 48 GB GDDR6 memory, 91.6 TFLOPS FP32 performance, and tensor ops for matrices. It's the efficient mid-range option for production AI work.

Hardware specifications:

Memory: 48 GB GDDR6
Memory bandwidth: 864 GB/s
CUDA cores: 18,176
FP32 performance: 91.6 TFLOPS
Peak FP8 throughput: 1,466 TFLOPS (with sparsity)
Power consumption: 350W
Form factor: Single-slot PCIe Gen 4

Vast.AI L40S Pricing

Vast.ai operates as a peer-to-peer GPU marketplace, connecting users with GPU holders. L40S pricing on the platform typically ranges from $0.35 to $0.50 per hour, significantly undercutting traditional providers. For comparison, RunPod's L40S costs $0.79/hour, while CoreWeave's 8xL40S bundle runs $18/hour total ($2.25 per GPU).

Vast.AI pricing factors:

Base rental rate: $0.35-0.50/hour per L40S GPU
Platform fee: 15% on top of provider rate
Bandwidth charges: $0.05 per GB outbound data
No minimum rental period (pay per minute)
Instant refunds for disrupted rentals (uptime guarantee)

Expect price swings. Peak hours hit $0.60-0.80/hr as supply tightens. Off-peak drops to $0.25-0.35. Flexible workloads win.

Performance Benchmarks

L40S handles 7B models at 120-150 tokens/sec. H100 hits 200+ but costs way more. L40S works for latency-sensitive apps that don't need massive throughput.

L40S benchmarks:

7B LLM inference: 120-150 tokens/sec
13B LLM inference: 60-80 tokens/sec
Image generation (Stable Diffusion): 8-12 images/minute
Batch inference (video analysis): 15-20 FPS
Fine-tuning 7B models: Effective throughput 2,000-3,000 tokens/sec

The L40S is particularly strong for:

Real-time inference with sub-100ms latency requirements
Multi-model serving on a single GPU
Computer vision inference at scale
Fine-tuning consumer-facing models

How to Rent L40S on Vast.AI

Vast.AI rental is straightforward:

Create Vast.AI account and verify email
Add payment method (credit card or crypto)
Click Create Instance in dashboard
Filter by GPU type: Search "L40S"
Sort by price (low to high) or reputation
Select a provider with 95%+ uptime rating
Choose machine specifications (vCPU, RAM, disk)
Set rental duration or leave open-ended
Enter SSH public key
Click Rent

Provisioning takes 30-120 seconds. Developers get IP and SSH creds instantly. Docker works natively. NVIDIA drivers and CUDA preinstalled.

FAQ

Why is Vast.AI L40S cheaper than RunPod or Lambda? Vast.AI is peer-to-peer: individuals and small datacenters rent spare capacity. This eliminates middleman margins. Users trade SLA guarantees and guaranteed uptime for lower prices. Disruptions occur in 1-2% of rentals on average.

What happens if my L40S rental gets disrupted? Vast.AI refunds the full rental amount pro-rata. Disruptions are rare (uptime targets 98-99% on highly-rated providers), but users should backup checkpoints every 30 minutes for production training.

Can I run multiple workloads on a single L40S? Yes. L40S supports MPS (Multi-Process Service) mode, allowing 2-3 concurrent processes. Performance degrades 20-30% per additional process. For production multi-tenant serving, dedicated instances are recommended.

Does Vast.AI support persistent storage? Each rental gets temporary NVMe storage (size varies by provider). Persistent storage isn't offered. Users should upload training data before rental starts and download results before instance terminates.

Which regions does Vast.AI serve? Vast.AI providers span North America, Europe, and Asia. Performance varies by location. U.S.-based providers typically have 5-15ms latency to AWS regions and 20-50ms to international datacenters.

Explore alternative GPU rental options:

Contents

L40S Technical Specifications

Vast.AI L40S Pricing

Performance Benchmarks

How to Rent L40S on Vast.AI

FAQ

Related Resources

Sources