Contents
Renting H100 GPUs on AWS
H100 on AWS ships as p5.48xlarge instances: 8 H100s for $55.04/hr on-demand as of March 2026. That's $6.88/GPU. RunPod charges $2.69/GPU. AWS costs more because developers get SageMaker, VPC integration, 99.99% SLA, and managed services. Worth it if developers need that ecosystem.
AWS H100 Availability
H100 GPUs are available on AWS p5.48xlarge instances:
- p5.48xlarge: 8x H100 SXM GPUs
- Available in select US regions (us-east-1, us-west-2)
- Not available in all AWS regions
AWS does not offer single-GPU H100 instances. Minimum allocation is 8xH100.
Pricing Structure
p5.48xlarge on-demand pricing: $55.04/hour (as of March 2026)
- 8x H100 SXM 80GB GPUs
- 2TB system RAM
- 8x 3.84TB NVMe storage
- 192 vCPU
- EFA network (3,200 Gbps)
Per-GPU cost: $55.04/8 = $6.88/hour per H100 on-demand
Monthly cost for 24/7 operation: $55.04 × 730 = $40,179/month
Spot instances pricing:
- p5.48xlarge spot: ~$16-20/hour (varies by region and availability)
- Monthly spot cost at $17/hr: $17 × 730 = $12,410/month (65-70% savings)
Spot instances terminate with 2-minute notice, unsuitable for non-resumable workloads.
Why AWS Costs More
RunPod: $2.69/GPU/hr. AWS p5: $6.88/GPU/hr on-demand. The premium covers:
- SageMaker integration (managed training, inference)
- VPC + IAM for multi-team access
- 99.99% SLA with credits
- CloudWatch monitoring built-in
- Auto-scaling groups
- S3/RDS/DynamoDB integration
Solo engineers building one-off models? Overpay. Large teams coordinating multi-region ML pipelines? Saves money.
p5 Instance Specifications
Each p5.48xlarge provides:
- 8x NVIDIA H100 SXM 80GB HBM3 GPUs
- 2TB DDR5 memory
- 192 vCPU
- 8x 3.84TB NVMe SSD (30.72TB total)
- 3,200 Gbps EFA network
No option for smaller instances. Customers needing 1-2 H100s are better served by RunPod or Lambda.
Deploying on AWS EC2
- Launch p5.48xlarge instance in AWS Console
- Select deep learning AMI (pre-configured with CUDA, Docker)
- Configure security groups and VPC
- SSH into instance using key pair
- Install model and dependencies
- Start inference or training
Setup time: 5-10 minutes after launch.
AWS provides:
- EC2 Instance Connect for browser-based terminal
- CloudWatch monitoring and logging
- Auto-scaling groups for multi-instance deployments
- SNS/SQS for job orchestration
SageMaker Integration
SageMaker provides managed ML services on top of EC2 instances.
SageMaker Training:
- Distributed training across multiple p5 instances
- Automatic fault handling and checkpointing
- Built-in algorithms for common tasks
- Monitoring and hyperparameter optimization
SageMaker Inference:
- Managed inference endpoints
- Automatic scaling based on traffic
- Model versioning and A/B testing
- Integration with other AWS services
SageMaker adds 20-30% overhead but removes operational burden.
Cost Comparison
AWS p5 (8x H100 SXM): $55.04/hr = $6.88/hr per GPU CoreWeave 8xH100: $49.24/hr = $6.16/hr per GPU RunPod 8x single instances: $21.52/hr = $2.69/hr per GPU
AWS wins on integrated services. CoreWeave wins on Kubernetes. RunPod wins on raw cost.
Cost Optimization on AWS
Use Spot Instances for fault-tolerant workloads. p5 spot instances cost 65-70% less than on-demand (~$16-20/hour vs $55.04/hour). Spot instances terminate with 2-minute notice.
Use Reserved Instances for sustained workloads. 1-year reserved instances save 25-30%. 3-year reserved instances save 40-50%.
Use Savings Plans for flexible pricing across instance types and regions.
Right-size instances. p5.48xlarge (8x H100) is the minimum. If needing only 2-4 GPUs, use alternative providers instead of paying for unused capacity.
Implement automated shutdown. Stop instances when not in use. EC2 charges only for running instances, not stopped instances.
Placement groups improve multi-instance networking. Critical for distributed training.
Regional Availability and Latency
p5 availability (as of March 2026):
- us-east-1 (N. Virginia): Consistent availability
- us-west-2 (Oregon): Limited availability
- Other regions: Request via AWS support
Cross-region p5 deployment is possible but expensive due to data transfer costs and administrative complexity.
Latency to AWS S3: <1ms within region, 50-150ms cross-region
Integration with Other AWS Services
S3 integration: Store training data, models, and outputs in S3
- 100GB transfer within region: Free
- 100GB transfer out of region: $0.02/GB
Lambda integration: Trigger training jobs from Lambda functions
- Useful for event-driven ML pipelines
RDS integration: Training models on database-sourced data
- Direct database connections available
DynamoDB integration: Store inference results or model metadata
These integrations reduce data movement costs and operational complexity.
FAQ
Is AWS p5 worth the premium over RunPod? Only if you need multi-GPU orchestration within AWS ecosystem. For isolated workloads, RunPod is significantly cheaper. For integrated pipelines spanning S3, Lambda, RDS, AWS integration pays for premium pricing through operational efficiency.
Can I use spot instances for production workloads? Only if workloads are fault-tolerant with automatic resumption. 60-70% cost savings are substantial but require rigorous checkpointing and failure handling.
How do I choose between AWS, CoreWeave, and RunPod? AWS: If requiring VPC integration, compliance certifications (SOC 2, HIPAA), or multi-region failover. CoreWeave: If needing 8+ GPUs with NVLink and high-bandwidth cluster communication. RunPod: If cost-conscious and needing flexible single or dual GPU configurations.
What's the minimum contract term on AWS? On-demand instances have no minimum. Reserved instances require 1 or 3-year commitments. Spot instances can be terminated with 2-minute notice.
Does AWS offer GPU-as-a-Service with SageMaker pricing? Yes. SageMaker Notebooks charge per instance hour plus storage. SageMaker Training and Inference have separate pricing. These managed services add 20-30% overhead but reduce operational burden.
What about AWS Bedrock for inference? Bedrock is API-based inference, not GPU rental. Pricing is per-request (tokens). Bedrock is economical for low-volume applications, expensive for high-volume inference. Below 50B monthly tokens, Bedrock is often cheaper than p5 rental.
Related Resources
- AWS GPU Pricing
- NVIDIA H100 Price
- RunPod GPU Pricing
- Lambda Labs GPU Pricing
- CoreWeave GPU Pricing
- AI Inference Platform Cost Calculator
Sources
- AWS EC2 p5 pricing (accessed March 2026)
- AWS SageMaker pricing documentation (March 2026)
- H100 technical specifications from Nvidia (2026)
- AWS service integration documentation (2026)