Contents
Renting H100 GPUs on RunPod
RunPod offers H100 on RunPod pricing at the lowest cost in the market. RunPod's per-second billing model and transparent pricing make it ideal for cost-conscious teams.
H100 GPU Specifications
H100 PCIe: 80GB HBM2e memory, 350W power consumption, PCIe 5.0 interface H100 SXM: 80GB HBM3 memory, 700W power consumption, NVLink interconnect capable
RunPod offers both variants at different price points. SXM variant at $2.69/GPU-hour provides better multi-GPU scalability.
H100 supports 70B parameter models in full precision. Quantized models can exceed 100B parameters within 80GB memory constraints.
RunPod Pricing Structure
H100 PCIe: $1.99 per GPU-hour H100 SXM: $2.69 per GPU-hour
RunPod charges hourly. Billing increments at per-minute granularity, so short jobs are billed accurately.
For monthly estimates:
- H100 PCIe at 24/7 operation: $1.99 × 730 = $1,453/month
- H100 SXM at 24/7 operation: $2.69 × 730 = $1,964/month
For intermittent usage:
- 1 hour usage: H100 PCIe = $1.99, H100 SXM = $2.69
- 8 hours daily usage: H100 PCIe = $15.92/day = $477/month, H100 SXM = $21.52/day = $645/month
RunPod's pricing advantage emerges with sporadic or batch workloads where per-minute billing avoids waste.
Pod Configurations
RunPod offers standard pod options:
- Single H100 PCIe pod
- Single H100 SXM pod
- Multi-GPU bundles (2-8 H100s, custom pricing)
Standard H100 PCIe Pod includes:
- 1x H100 PCIe GPU
- 142GB system RAM
- 50GB NVMe storage (expandable)
- CPU: 16-32 cores depending on configuration
- Network: shared bandwidth
RunPod Pods vs Serverless
RunPod Pods are persistent instances. Rent a pod, it stays allocated and running. Developers pay for allocation time regardless of utilization.
RunPod Serverless is event-driven. Containers start on-demand when requests arrive. Billing is per-request or per-second of actual execution.
For sustained inference APIs, Pods are usually cheaper. For batch inference triggered by external events, Serverless is often cheaper.
Compare:
- Pod: $1.99/hour × 24 hours × 30 days = $1,433/month for continuous operation
- Serverless: Same H100 at $0.30/request × 1,000 requests = $300/month (if generating 1,000 requests monthly)
Pod breaks even at ~4,777 requests monthly (assuming 10 seconds per request at 1 request/10s throughput).
How to Rent H100 on RunPod
- Go to runpod.io and create account
- Select GPU type: Search "H100 PCIe" or "H100 SXM"
- Choose template: PyTorch, TensorFlow, or custom
- Configure storage and system RAM
- Launch pod
- SSH into allocated pod using provided credentials
Setup time: 2-5 minutes after launch.
RunPod provides:
- GPU metrics dashboard showing utilization
- SSH terminal access
- JupyterLab optional integration
- Container persistence across reboots
Running Inference on RunPod H100
Deploy models using:
- Hugging Face Transformers library
- VLLM for optimized inference
- TensorRT for maximum throughput
- Custom inference servers
Example workflow:
- Launch H100 pod ($1.99/hour PCIe)
- SSH into pod
- git clone model repository
- Install dependencies with pip
- Start inference server (Flask, FastAPI, etc.)
- Query from external application or laptop
No special containerization required. RunPod accepts standard Linux environments.
Performance Benchmarks
Inference throughput on RunPod H100:
- LLaMA 2 70B full precision: 40 tokens/second (with batch size 8)
- LLaMA 2 7B: 300 tokens/second
- Mistral 7B: 250 tokens/second
- Stable Diffusion XL: 1-2 images per minute per GPU
Training benchmarks:
- Fine-tuning LLaMA 2 7B: 200 examples/second
- BERT fine-tuning: 300 sequences/second with gradient accumulation
Network: 1Gbps shared bandwidth (actual throughput ~500Mbps)
Cost Optimization Tips
Use H100 SXM ($2.69/hr) only for multi-GPU distributed training. Single-GPU workloads should use H100 PCIe ($1.99/hr) to save 26%.
Batch inference requests to reduce per-request overhead. Processing 100 requests in one inference call costs less than processing them individually.
Implement checkpointing for training. If a pod crashes, resume from latest checkpoint instead of restarting.
Use spot instances when available. RunPod offers spot pricing at 30-50% discounts. Spot instances terminate with 1-hour notice, suitable for fault-tolerant workloads.
Monitor pod utilization. If GPU utilization below 30%, consider stopping the pod and using Serverless endpoints instead.
Regional Availability
RunPod operates globally with H100 availability across:
- US (multiple regions)
- EU (Western Europe)
- Asia (Singapore, limited availability)
Geographic selection affects latency and regional data residency. US regions have best availability. Asian regions might have limited H100 stock.
FAQ
Should I use RunPod or Lambda for H100? RunPod is cheaper ($1.99-2.69/hr vs Lambda's $2.86-3.78/hr). Use RunPod for sporadic or batch workloads. Use Lambda for sustained workloads requiring 99% SLA and responsive support.
How long does pod startup take? Typically 30-60 seconds from launch button to SSH connectivity. Some container images take longer to pull (1-3 minutes). Fresh pod startup is fastest.
Can I save my work between pod sessions? Yes. Files saved in /home/runpod or mounted storage persist across sessions. Code, models, and datasets persist. Pay only when pod is active.
What if I need more than 80GB GPU memory? Use H200 at $3.59/hr (141GB) on RunPod, or multiple H100s with distributed inference. Neither option offers more than 80GB per H100 device.
Does RunPod offer reserved capacity? Yes, contact sales for bulk pricing on committed usage. Monthly commitments achieve 15-25% discounts for reliable capacity reservation.
How does RunPod handle GPU failures? RunPod migrates pods to healthy GPUs if hardware fails. This migration takes 1-2 minutes. Your data persists. No extra charges for the migration.
Advanced RunPod Features
Runpod File Sync: Synchronize local directories with pod storage. Develop locally, sync to pod for execution. Results sync back to local machine automatically.
Pod Templates: Community-contributed templates for popular models (LLaMA, Mistral, Stable Diffusion) accelerate setup. Most models deploy in 5 minutes using templates.
API Endpoints: Convert pods into API endpoints. Other applications query your pod via HTTP. Endpoints handle autoscaling automatically.
Notebook Integration: JupyterLab integration provides notebook interface. Data scientists prefer notebooks to terminal interfaces.
Cost Calculation Examples
Example 1: Personal Research Project
- Usage pattern: 8 hours daily, 5 days weekly
- Hardware: H100 SXM at $2.69/hr
- Monthly cost: 8 × 5 × 4.3 × $2.69 = $463/month
- Break-even vs ownership: 18+ months payback with $10,000 GPU purchase
Example 2: Startup Inference API
- Traffic: 1,000 requests daily, 5 seconds inference latency
- Pod configuration: 1 H100 PCIe, always running
- Monthly cost: $1.99 × 730 = $1,453/month
- Scaling: Add second GPU at 50,000 daily requests
Example 3: Batch Processing Service
- Processing: 10,000 documents daily with 30-second processing per document
- Compute: 10,000 × 30 / 3,600 ≈ 83 GPU-hours needed daily
- Hardware: H100 PCIe at $1.99/hr
- Monthly cost: 83 × 30 × $1.99 = $4,956/month
Optimizing RunPod Pod Utilization
Monitor GPU utilization dashboard. If utilization below 30%, pod is over-provisioned. Downgrade hardware or enable sharing with other workloads.
Batch inference requests to increase GPU utilization. A pod processing 10 serial requests serially uses 1% GPU. Same requests batched together use 80% GPU.
Use spot instances aggressively. Spot pricing at 30-50% discount justifies occasional interruptions. Fault-tolerant workloads benefit substantially.
Implement pod sharing. Multiple research projects running on single pod reduces cost. Containers isolate workloads, prevent interference.
Advanced Networking Configuration
RunPod VPC integration: Pods can connect to private networks. This enables secure connections to on-premises infrastructure.
Static IP allocation: Pods can use static IP addresses, enabling reliable DNS-based connectivity from external applications.
Custom routing: Advanced networking enables connecting pods to specific VPCs or networks.
Pod Autoscaling and Management
Manual scaling: Increase/decrease active pods in real-time. API-based pod management enables programmatic scaling.
Spot instance fallback: Configure pods to automatically convert to lower-cost spot instances if on-demand capacity unavailable.
Multi-pod orchestration: Deploy inference endpoints across 5-10 pods for redundancy and load distribution.
Storage Optimization
Pod storage at $0.23/GB/month is expensive. Optimize by:
- Storing only essential data on pod
- Using external storage (AWS S3) for large datasets
- Compressing models and caching aggressively
- Deleting temporary files regularly
A 100GB model stored on pod costs $23/month. Cloud storage (S3) costs $2.30/month. External storage is 10x cheaper.
Production Deployment Patterns
Running production inference on RunPod requires:
- Configure pod with model, dependencies, inference server
- Implement health checks and monitoring
- Set up logging and error alerting
- Configure backup/restore procedures
- Document infrastructure as code
Infrastructure-as-code enables reproducible deployments and disaster recovery.
Compliance and Security Considerations
RunPod operates in US regions primarily. GDPR-compliant EU regions available. Data residency requirements must be evaluated carefully.
Pod isolation: Pods are isolated from other customer pods. Kernel-level isolation provides reasonable security for non-sensitive workloads.
Sensitive workloads (healthcare, finance) should evaluate RunPod's security posture carefully. Self-hosted or AWS infrastructure might be required.
Related Resources
- RunPod GPU Pricing
- NVIDIA H100 Price
- Lambda Labs GPU Pricing
- CoreWeave GPU Pricing
- Modal vs RunPod Serverless Comparison
Sources
- RunPod pricing API (accessed March 2026)
- RunPod documentation (accessed March 2026)
- H100 technical specifications from Nvidia (2026)
- Performance benchmarks from DeployBase.AI testing (March 2026)