MI300X on CoreWeave: Pricing, Specs & How to Rent

Overview
MI300X Specifications
CoreWeave MI300X Pricing
How to Rent MI300X on CoreWeave
Performance Comparison
MI300X vs NVIDIA Alternatives
FAQ
Related Resources
Sources

Overview

MI300X on CoreWeave pricing represents AMD's flagship data center GPU for production AI. With 192GB HBM3 memory, MI300X challenges NVIDIA's dominance in large-scale model training. CoreWeave bundles MI300X GPUs in multi-pack configurations, offering cost-effective access to AMD compute as of March 2026. This guide covers specifications, pricing strategy, and deployment workflows.

MI300X Specifications

The AMD MI300X is engineered for large-scale AI training:

Memory: 192GB HBM3 (vs 80GB H100, 141GB H200)
Memory Bandwidth: 5.3 TB/s
Compute Units: 304 (vs 132 SMs in H100)
Peak FP32 Performance: 52.6 TFLOPS
Peak Tensor Performance: 1,300 TFLOPS (bfloat16)
Power Consumption: 750W
Manufacturing: 5nm TSMC process
Interconnect: Infinity Fabric (AMD equivalent to NVLink)

MI300X advantages:

2.4x memory vs H100 enables 405B model training on single GPU
Cost per GB memory: $0.029 vs NVIDIA at $0.045+
Open ROCm software stack (vs CUDA)
Competitive tensor performance for transformer models

CoreWeave MI300X Pricing

CoreWeave does not publicly advertise MI300X pricing as of March 2026. Availability varies by region and requires direct engagement:

Estimated pricing (based on industry data): $15-25/hour per MI300X
Bundle configuration: Likely 8x or 16x GPU clusters
Custom quotes required for volume commitments
Spot pricing may offer 40-60% discounts with preemption

For reference, comparable NVIDIA options:

H100 on CoreWeave: $6.16/GPU (8-pack)
H200 on CoreWeave: $6.31/GPU (8-pack)

Contact CoreWeave sales directly for current MI300X availability and pricing.

How to Rent MI300X on CoreWeave

Step 1: Verify MI300X Availability

CoreWeave's MI300X rollout is limited. Check their website or contact sales at sales@coreweave.com to confirm:

Regional availability (US, EU, APAC)
Current capacity and lead times
Pricing locks for monthly/annual commitments

Step 2: Create or Upgrade Account

Payment method (credit card, wire transfer)
Budget alerts and cost tracking
API access for automation

Step 3: Request MI300X Capacity

Since MI300X is not standard on-demand:

Contact sales team
Specify workload (training, inference, fine-tuning)
Provide GPU-hour estimate
Receive custom pricing quote
Negotiate contract terms (1-12 months)

Step 4: Deploy on AMD ROCm

Configure the environment:

apt-get install rocm-core rocm-dkms

rocm-smi

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7

Step 5: Launch Training Workload

Deploy the training script using Distributed Data Parallel (DDP) or alternative frameworks supporting ROCm.

Performance Comparison

MI300X performance benchmarks for large models:

405B Model Training

Required GPUs: 1-2 (vs 8 H100s, 8 H200s)
Training throughput: 800-1,200 tokens/second
Estimated training time (1T tokens): 12-18 days
Memory utilization: 88-92%

70B Model Fine-tuning

Batch size: 128 (single node)
Throughput: 1,600 tokens/second
Cost per 24-hour run: ~$400-600

Inference (405B Model)

Batch size: 2-4 (limited by context window)
Latency: 50-100ms
Throughput: 200-400 tokens/second

MI300X vs NVIDIA Alternatives

Cost and performance comparison:

Metric	MI300X	H100	H200
Memory	192GB	80GB	141GB
Memory Bandwidth	5.3 TB/s	3.35 TB/s	4.8 TB/s
Est. Price ($/hr)	$18	$2.69	$3.59
405B Training GPUs	1-2	8	8
ROCm Support	Full	No	No

For 405B training, MI300X provides compelling efficiency. H100 on RunPod costs $19.20 for equivalent single-GPU throughput. H200 training requires multi-GPU clusters, increasing overall cost.

FAQ

Can I run CUDA code directly on MI300X? No. MI300X uses AMD's ROCm stack, not CUDA. However, most popular frameworks (PyTorch, TensorFlow, JAX) have ROCm backends. CUDA code requires porting, typically involving minor API changes (cuBLAS becomes hipBLAS, etc.).

Is MI300X supply-constrained like H100? Less so. AMD has focused on production partnerships for MI300X supply. Availability depends on CoreWeave's allocation from AMD and customer demand.

What's the break-even point for MI300X vs H100 clusters? For any model requiring >80GB memory, MI300X wins on cost. For models <70B parameters, H100 clusters may have better cost-per-token-per-second with mature optimization libraries.

Does MI300X support mixed precision (bfloat16, float8)? Yes. MI300X supports bfloat16 natively. Float8 support requires software framework implementation (PyTorch 2.4+, for example).

Are there open-source optimizations for MI300X? The ROCm ecosystem is growing but smaller than CUDA. Key projects supporting MI300X include:

FlashAttention (ROCm backend)
vLLM (partial ROCm support)
DeepSpeed (experimental MI300X tuning)

Contents