Contents
- H100 on Vast.ai
- Vast.ai Marketplace Overview
- Performance Benchmarks on Vast.ai
- Provider Selection and Risk Assessment
- Bidding Strategy and Cost Optimization
- Detailed Setup Walkthrough
- Workload Optimization for Marketplace Constraints
- Cost Optimization on Vast.ai
- Comparing Vast.ai to Dedicated Providers
- Cost Per Million Tokens (Inference Benchmarking)
- FAQ
- Sources
H100 on Vast.AI
Vast.AI: $2.50-4.00/hr for peer-to-peer H100 rentals. Cheapest sometimes, but pick providers carefully (uptime matters). Average: $3.15/hr.
This covers marketplace dynamics, pricing, bidding, and provider selection.
Vast.AI Marketplace Overview
As of March 2026, Vast.AI differs fundamentally from traditional cloud providers. Instead of fixed pricing, GPU owners list hardware at self-determined rates. Renters bid for access, creating a market-driven pricing system.
H100 Pricing Patterns and Monthly Projections
Current H100 availability on Vast.AI shows:
| Tier | Price Range | Monthly (730 hrs) | Availability | Reliability |
|---|---|---|---|---|
| Budget | $2.50-3.00/hr | $1,825-2,190 | 20-40 listings | Variable provider quality |
| Mid-Market | $3.00-3.50/hr | $2,190-2,555 | 60-100 listings | Mixed reliability |
| Premium | $3.50-4.00/hr | $2,555-2,920 | 40-60 listings | Higher uptime SLAs |
| Production | $4.00+/hr | $2,920+ | 5-15 listings | Dedicated support |
Pricing updates hourly based on actual renter activity. Check Vast.AI's price history dashboard for long-term trends before committing to multi-day workloads. Average market rate across all tiers: $3.15/hr, yielding monthly cost of ~$2,300.
Performance Benchmarks on Vast.AI
H100 performance varies by provider hardware pairing and network quality:
| Metric | Budget Tier | Mid-Market Tier | Premium Tier |
|---|---|---|---|
| Inference Throughput (70B model) | 35-40 tokens/sec | 40-45 tokens/sec | 45-50 tokens/sec |
| Network Latency | 100-500ms | 50-150ms | <50ms |
| Training Throughput (fine-tuning) | 200-300 tokens/sec | 300-400 tokens/sec | 400-500 tokens/sec |
| Uptime (measured) | 85-92% | 92-97% | 97-99% |
Provider network quality and CPU pairing significantly impact GPU throughput, up to 40% variance between budget and premium tiers.
Provider Selection and Risk Assessment
Evaluating Provider Reliability
Vast.AI displays provider metrics critical to risk assessment:
- GPU Uptime Score: Historical availability percentage (aim for >98%)
- Internet Speed: Measured upload/download bandwidth to instance (aim for >100 Mbps)
- Renter Reviews: Qualitative feedback on stability and responsiveness (4.5+ stars)
- Hardware Specifications: GPU type, CPU pairing, storage options (verify H100 specifically, not H100X)
- Rental Hours: Provider's experience with long-term renters (>100 hours minimum)
Filter for providers with 100+ rental hours and >97% uptime score. New providers with <20 hours history should be avoided unless pricing discount exceeds 30%. Read recent reviews (last 10) for patterns: single complaints are normal; repeated issues indicate systemic problems.
Geolocation, Latency, and Network Quality
Vast.AI shows provider location (typically US, Europe, or Asia-Pacific). Selection criteria vary by workload:
- Interactive inference: Prefer same-region provider with <50ms latency to avoid round-trip delays
- Batch training: Geolocation irrelevant; prioritize pricing and uptime score
- Data transfer: Providers with 500+ Mbps upload enable fast dataset transfers; verify in provider specs
Test network quality before committing to multi-day jobs: iperf3 -c provider-ip from local machine to measure actual bandwidth available.
Bidding Strategy and Cost Optimization
Fixed-Price Versus Bid-Based Rental
Vast.AI offers two rental modes:
- Interruptible: Cheaper but can be terminated with 4-hour notice when provider needs GPU
- On-Demand Fixed: Higher cost but guaranteed until developers terminate
For training jobs with checkpointing, interruptible instances at 30-40% discount versus on-demand save substantial costs. For production inference, fixed-price instances ensure reliability.
Dynamic Bidding Approach
Instead of accepting listed prices, bid below asking rate. The platform accepts bids from providers with excess capacity:
- Note median H100 asking price (typically $3.20-3.50/hr)
- Set bid limit at 80-85% of median
- Wait during off-peak hours (2-6 AM UTC typically show best fill rates)
- Monitor bid acceptance rate; if consistently rejected, increase to 90% of median
This approach can reduce effective H100 cost to $2.70-3.00/hr for flexible-timing workloads.
Detailed Setup Walkthrough
Instance Launch Procedure
- Browse available H100 listings filtered by price, location, uptime score
- Select "Rent" or place custom bid:
- For on-demand fixed: Click "Rent Now" at listed price
- For interruptible: Can bid lower; fill rate depends on provider excess capacity
- Choose storage configuration (provider may offer 50-500GB templates)
- Generate SSH keys or use existing public key
- First time: Download private key securely; do not share
- Subsequent: Use existing key for same provider
- Select 24-hour or 168-hour billing cycle
- Instance launches within 2-10 minutes (check status in dashboard)
- Connect via SSH using provided IP address immediately upon launch
Typical Instance Launch Timeline
| Step | Duration | Status |
|---|---|---|
| Payment processing | 30 seconds | Queued |
| Provider instance allocation | 1-3 minutes | Launching |
| Container/OS initialization | 2-5 minutes | Starting |
| SSH accessibility | 5-10 minutes total | Running |
Total typical time: 8 minutes from launch click to SSH access.
SSH Configuration and Remote Access
Vast.AI assigns public IPs with random high-numbered SSH ports. Store connection details in ~/.ssh/config for quick access:
Host vastai-h100
HostName provider-ip.address
Port 12345
User root
IdentityFile ~/.ssh/vastai_key
ServerAliveInterval 60
ServerAliveCountMax 3
ssh vastai-h100
ssh -L 8888:localhost:8888 vastai-h100
Vast.AI assigns random high-numbered ports (>10000) to avoid port conflicts. Update ~/.ssh/config after each instance launch with new IP and port.
Data Transfer Optimization
Upload training data before long-running jobs using rsync for resume capability:
rsync -av --progress --partial \
/local/training_data/ \
root@provider-ip:~/data/training_data/
rsync -av --progress --partial \
/local/training_data/ \
root@provider-ip:~/data/training_data/
Vast.AI's network performance varies by provider:check bandwidth metrics in provider listing before transferring multi-GB datasets. Budget Tier providers typically offer 50-100 Mbps; Premium Tier offer 500+ Mbps. Network speed variance can cause 5-10x difference in data loading time.
Workload Optimization for Marketplace Constraints
Checkpoint-Based Training
Always enable checkpointing for Vast.AI jobs to tolerate instance interruptions. Save model weights every 500 training steps:
if step % 500 == 0:
torch.save({
'epoch': epoch,
'model_state': model.state_dict(),
'optimizer_state': optimizer.state_dict(),
}, f'checkpoint_step_{step}.pt')
Batch Inference with Timeout Handling
For inference workloads, implement timeout mechanisms and provider failover:
import signal
def timeout_handler(signum, frame):
# Save inference results and cleanly exit
print("Instance termination imminent, saving results...")
save_results()
sys.exit(0)
signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(3600) # 1-hour timeout warning
Cost Optimization on Vast.AI
Bidding Strategy for Lower Effective Costs
Rather than accepting listed prices, submit strategic bids 20-30% below asking rates:
import requests
from datetime import datetime
def get_market_stats():
# Query Vast.ai's price history API
# Identify off-peak windows and price trends
typical_h100_asking = 3.25 # Mid-market average
bid_price = typical_h100_asking * 0.75 # Bid 25% below asking
Effective cost with strategic bidding: $2.45/hr average (down from $3.25 asking), reducing monthly cost to ~$1,790.
Provider Portfolio Diversification
Maintain relationships with 3-4 premium providers to minimize interruption risk:
providers = [
{"name": "provider-a", "price": 3.40, "uptime": 0.98},
{"name": "provider-b", "price": 3.55, "uptime": 0.99},
{"name": "provider-c", "price": 3.35, "uptime": 0.97},
]
selected = min(providers, key=lambda p: p["price"] / p["uptime"])
Spreading workload across providers ensures fallback capacity if one experiences unavailability.
Comparing Vast.AI to Dedicated Providers
Vast.ai's H100 average cost ($3.00-3.50/hr) is above RunPod H100 SXM at $2.69/hr on-demand and comparable to Lambda H100 SXM at $3.78/hr, but competitive with strategic bidding ($2.44-2.70/hr effective). Key differences:
| Metric | Vast.AI | RunPod | Lambda | CoreWeave |
|---|---|---|---|---|
| Typical H100 Cost | $3.25 | $2.69 | $3.78 (SXM) | $6.16 |
| Reliability | Variable (85-99%) | Guaranteed | Guaranteed | Guaranteed |
| Support | Peer-to-peer | Technical | Technical | Technical |
| Setup Time | 5-15 min | 2-5 min | 3-8 min | 10-15 min |
- Reliability: Dedicated providers guarantee availability; Vast.AI requires provider risk management (cost: 2-5% lost time to interruptions)
- Support: RunPod and Lambda offer technical support; Vast.AI provides peer-to-peer only
- Tooling: Dedicated providers include optimized templates and integrations; Vast.AI offers bare Linux
Vast.ai excels for cost-sensitive workloads tolerating provider variability. For sustained production inference requiring 99%+ uptime, see CoreWeave's H100 cluster pricing for managed alternatives.
Cost Per Million Tokens (Inference Benchmarking)
For inference workloads, effective cost depends on model throughput:
- 70B-parameter model on H100: ~50 tokens/second (batch size 8)
- Assumed utilization: 75% (accounting for idle time)
- Effective throughput: 37.5 tokens/second
- Cost at $3.00/hr: $0.022 per 1K tokens
This compares favorably to managed inference services charging $0.05-0.15 per 1K tokens.
FAQ
How do I avoid unreliable Vast.AI providers?
Filter for >98% uptime, >100 hours historical rentals, and >4.5 star reviews. Start with small test jobs before committing multi-day workloads. Always enable checkpointing so interruptions don't cause complete job loss.
What's the difference between interruptible and on-demand on Vast.AI?
Interruptible instances (cheaper by 30-40%) can be terminated with 4-hour notice when provider reclaims GPU. On-demand fixed-price instances guarantee access until you release them. Use interruptible for resumable batch work, fixed for production serving.
Can I use Vast.AI H100 for production inference APIs?
Yes, but with caveats. Select premium providers (>99% uptime, dedicated support), use on-demand fixed pricing, and implement provider failover to secondary instance. Production-critical inference should prioritize dedicated providers like Lambda Labs.
How much can I save by bidding strategically on Vast.AI vs accepting listed prices?
Strategic bidding during off-peak hours (2-6 AM UTC) at 75% of asking price yields 60-70% acceptance rate. Savings: asking price typically $3.25/hr, bid $2.44/hr, effective savings 25% ($0.81/hr). For month-long workload: $0.81 × 730 hours = $591/month savings. More aggressive bidding (65% of asking) increases savings to 35% but reduces acceptance probability.
What's the minimum provider uptime score acceptable for production training on Vast.AI?
Minimum 96% uptime acceptable for resumable training with hourly checkpointing. At 96% uptime, expect ~29 hours downtime monthly. With checkpoint frequency, maximum loss is 1 hour of training. Below 94% uptime, expect >43 hours downtime monthly (unacceptable). Premium providers (>98% uptime) are recommended for production workloads despite 10-15% price premium.
Sources
- Vast.AI Marketplace: https://www.vast.ai/
- Vast.AI Bidding Guide: https://docs.vast.ai/
- NVIDIA H100 Architecture: https://www.nvidia.com/en-us/data-center/h100/
- PyTorch Checkpointing: https://pytorch.org/docs/stable/checkpoint.html