Contents
- A100 RunPod: Professional Compute at Entry-Level Pricing
- A100 Pricing and Configurations
- Performance Benchmarks
- RunPod A100 Templates and Setup
- A100 Performance Characteristics
- Spot Pricing and Batch Processing
- Detailed Setup and Running Workloads on A100
- Cost Optimization Strategies
- Cost Comparison to Other Providers
- Storage and Data Management
- FAQ
- Sources
A100 RunPod: Professional Compute at Entry-Level Pricing
A100 RunPod pricing starts at $1.19 per hour for PCIe configurations and $1.39 per hour for high-bandwidth SXM variants. The A100, released in 2020, remains the most cost-efficient GPU for production workloads, particularly for teams prioritizing lower compute costs over latest performance. RunPod's flexible pricing model and spot market make A100 instances even more economical.
This guide covers RunPod A100 pricing, template options, spot market strategies, and when A100 exceeds H100 value.
A100 Pricing and Configurations
RunPod's A100 pricing undercuts all competing providers while delivering exceptional value for most production workloads.
A100 Pricing Tiers and Monthly Analysis
| Configuration | Hourly | Monthly (730 hrs) | Annual | Spot Savings | Per-Token (50 tokens/sec) |
|---|---|---|---|---|---|
| A100 PCIe | $1.19 | $869 | $10,427 | 40-60% | $0.0066 |
| A100 SXM | $1.39 | $1,014 | $12,172 | 40-55% | $0.0077 |
| A100 Spot (avg 50%) | $0.60 | $438 | $5,256 | N/A | $0.0033 |
| A100 Spot (avg 45%) | $0.76 | $555 | $6,654 | N/A | $0.0042 |
Spot pricing reduces A100 costs by 40-60% during off-peak hours, making batch training extremely economical. A 40-hour fine-tuning job on A100 Spot costs approximately $25-35 (avg $30), versus $47.60 on-demand:a 37% savings for resumable workloads.
Provider Cost Comparison at Standard Rates
| Provider | A100 Cost | Monthly | Per-Token |
|---|---|---|---|
| RunPod | $1.19 | $869 | $0.0066 |
| Lambda | $1.48 | $1,080 | $0.0082 |
| Vast.AI (avg) | $1.00 | $730 | $0.0055 |
| CoreWeave (8x) | $2.70 | N/A (cluster) | N/A |
| AWS p4d | $4.10 | N/A (8x cluster) | N/A |
RunPod provides best single-GPU A100 pricing among dedicated providers, with Vast.AI peer-to-peer offering lower cost at higher risk.
Per-GPU Cost Analysis and Cost Optimization
Running inference on single A100 costs $0.017/hour per 1K tokens (assuming 50 tokens/second throughput, 75% utilization). This compares favorably to managed inference services charging $0.05-0.15 per 1K tokens.
Detailed cost analysis:
- A100 hourly cost: $1.19/hr
- Inference throughput: 50 tokens/second sustained
- Cost per token: $1.19 / (50 × 3600) = $0.00000663/token
- Cost per 1M tokens: $6.63
This is 7-22x cheaper than managed services. For batch processing, costs drop further with request batching.
Performance Benchmarks
A100 Inference Throughput
A100 40GB HBM2 delivers predictable throughput across configurations:
| Model | Batch Size | Throughput | Latency (TTFT) |
|---|---|---|---|
| 7B Mistral | 1 | 55-65 tokens/sec | 30-50ms |
| 13B Model | 1 | 35-45 tokens/sec | 40-70ms |
| 30B Model | 1 | 15-25 tokens/sec | 80-150ms |
| 7B Mistral | 8 | 150-200 tokens/sec | 100-200ms |
Training Speed
Fine-tuning throughput with standard configurations:
| Task | Configuration | Throughput |
|---|---|---|
| 7B LoRA (16-bit) | A100 SXM | 380-420 tokens/sec |
| 13B Full (8-bit) | A100 SXM | 200-250 tokens/sec |
| 30B LoRA (4-bit) | A100 PCIe | 150-200 tokens/sec |
RunPod A100 Templates and Setup
RunPod provides pre-configured environment templates optimizing A100 deployment.
PyTorch Template
The official PyTorch template includes:
- CUDA 12.2, CuDNN 8.9
- PyTorch 2.1 with torch.compile support
- TensorFlow 2.13 (optional)
- Jupyter lab pre-configured
Launch this template in 90 seconds versus 15+ minutes for custom environment setup.
vLLM Inference Template
Purpose-built for LLM serving:
- vLLM server pre-configured
- Flash Attention 2 enabled
- Token streaming support
- OpenAI-compatible API endpoint
python -m vllm.entrypoints.openai.api_server \
--model mistralai/Mistral-7B-Instruct-v0.1 \
--tensor-parallel-size 1 \
--gpu-memory-utilization 0.9
Custom Template Creation
Build custom templates from Docker images for specialized workloads:
FROM nvidia/cuda:12.2.0-devel-ubuntu22.04
RUN apt-get update && apt-get install -y python3-pip
RUN pip install torch torchvision torchaudio
RUN pip install huggingface_hub transformers
Push to Docker Hub, then select "Custom Image" in RunPod console.
A100 Performance Characteristics
Throughput vs H100
A100 (40GB or 80GB HBM2e) delivers 312 TFLOPS BF16 tensor core performance (19.5 TFLOPS FP32) versus H100's 1,979 TFLOPS BF16 - approximately 16% of H100's peak tensor throughput. However, for most inference workloads, memory bandwidth (2.04 TB/s) rather than compute limits performance.
When A100 Matches or Exceeds H100
- Memory-bound operations: Many inference tasks (especially large batch sizes) hit bandwidth limits before compute limits. A100 and H100 perform identically.
- Quantized inference: 8-bit or 4-bit models run identically on A100 and H100; compute headroom vanishes.
- Cost-per-token analysis: At $1.39/hr versus H100 $2.69/hr, A100 provides superior cost-per-token for inference despite lower peak throughput.
For fine-tuning models under 20B parameters, A100 SXM at $1.39/hr outperforms H100 on value.
Spot Pricing and Batch Processing
Off-Peak Bidding Strategy
RunPod's spot market displays historical pricing. Bid at 50-60% of on-demand rates during off-peak windows (typically 1-6 AM UTC):
- View 7-day spot price history for A100 PCIe (usually $0.48-0.71/hr)
- Set bid at 55% of typical price ($0.65/hr for $1.19 baseline)
- Launch during low-demand windows
- Accept 1-2 hour wait for instance availability
This reduces training costs by 50-60% with minimal disruption to resumable workloads.
Checkpointing for Spot Reliability
Enable frequent checkpoints (every 500 steps) to tolerate interruptions:
checkpoint_dir = '/workspace/checkpoints'
for step in range(total_steps):
loss = train_step(batch)
if step % 500 == 0:
model.save_checkpoint(f'{checkpoint_dir}/step_{step}.pt')
# Upload to persistent storage
subprocess.run(['aws', 's3', 'cp',
f'{checkpoint_dir}/step_{step}.pt',
's3://my-bucket/checkpoints/'])
Spot interruptions cost effective training time, not money, since terminated instances don't incur charges.
Detailed Setup and Running Workloads on A100
Launching A100 Instances: Step-by-Step Guide
- Access RunPod console at https://www.runpod.io/console/gpu-cloud
- Click "GPU Cloud" in sidebar
- Filter by GPU: Select "A100"
- Choose configuration:
- A100 PCIe ($1.19/hr): Best value, adequate for most single-GPU work
- A100 SXM ($1.39/hr): Select if requiring NVLink for multi-GPU distribution
- Select template: PyTorch 2.0 (recommended), TensorFlow, or JAX
- Configure:
- vCPU: 8-16 minimum (16+ for large batch processing)
- Memory: 20-32GB RAM
- Storage: 50GB minimum (200GB+ for training data)
- Select region (primary: US-East, US-West)
- Click "Deploy" and wait 2-5 minutes
LLM Fine-Tuning
The A100's 40GB HBM2 memory supports full-parameter fine-tuning of 7B-13B parameter models:
from transformers import AutoModelForCausalLM, Trainer
from peft import get_peft_model, LoraConfig
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B")
lora_config = LoraConfig(r=8, lora_alpha=16)
model = get_peft_model(model, lora_config)
trainer = Trainer(
model=model,
args=TrainingArguments(
per_device_train_batch_size=4,
gradient_accumulation_steps=4
),
train_dataset=train_data
)
trainer.train()
Expected fine-tuning throughput: 450-600 tokens/second on A100 SXM.
Multi-GPU A100 Clusters
While RunPod doesn't directly offer multi-A100 clusters, launch 2-4 separate A100 instances and use distributed training frameworks. Network communication between instances adds latency compared to local NVLink, but remains viable for data parallelism.
Cost Optimization Strategies
Spot Market Strategy for A100
RunPod A100 spot pricing averages 45-55% of on-demand rate during off-peak hours. For batch training:
- View 7-day spot price history (typically $0.48-0.71/hr for PCIe)
- Set maximum bid at 60% of on-demand ($0.71/hr for $1.19 baseline)
- Launch during off-peak (2-6 AM UTC) for 70%+ fill rate
- Enable hourly checkpointing to tolerate interruptions
Monthly cost for 200-hour training job:
- On-demand: $1.19 × 200 = $238
- Spot (55% average): $0.65 × 200 = $130
- Savings: $108 per 200-hour project
Batch Inference Optimization
Consolidate inference requests to maximize A100 throughput and minimize per-token cost:
from vllm import LLM
llm = LLM(
model="mistralai/Mistral-7B",
max_num_batched_tokens=8192, # Batch up to 8K tokens per request
gpu_memory_utilization=0.9
)
responses = llm.generate(
[prompt_1, prompt_2, prompt_3, prompt_4], # Batch multiple requests
sampling_params=SamplingParams(temperature=0.7, top_p=0.95)
)
Multi-Model Serving with LoRA Adapters
Run multiple fine-tuned models from single A100 using LoRA adapter swapping:
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B")
adapters = {
"customer_a": PeftModel.from_pretrained(base_model, "lora/customer-a"),
"customer_b": PeftModel.from_pretrained(base_model, "lora/customer-b"),
}
outputs_a = adapters["customer_a"].generate(**inputs)
outputs_b = adapters["customer_b"].generate(**inputs)
This enables serving 3-4 different customer models from single instance, reducing per-model cost 70-75%.
Cost Comparison to Other Providers
Single-GPU A100 Pricing
| Provider | A100 PCIe | A100 SXM |
|---|---|---|
| RunPod | $1.19 | $1.39 |
| Lambda | $1.48 | N/A |
| Vast.AI | $0.80-1.50 | $0.80-1.50 |
| AWS | $1.95 | $2.19 |
RunPod and Lambda provide competitive pricing with guaranteed capacity. Vast.ai's peer-to-peer marketplace offers lower spot pricing but with provider variability. AWS A100 pricing includes CPU and managed services premium.
See the A100 Lambda guide for reserved pricing options. Check CoreWeave's A100 clusters for multi-GPU training setups.
Storage and Data Management
Persistent Volumes
RunPod's network-attached storage costs $0.10/GB/day. For a 100GB dataset, this costs ~$3/day. Download datasets to instance storage on launch to avoid storage charges:
#!/bin/bash
aws s3 cp s3://my-bucket/dataset.tar.gz /tmp/
tar -xzf /tmp/dataset.tar.gz -C /workspace/data/
rm /tmp/dataset.tar.gz # Free up space
Model Checkpointing
Save model checkpoints to S3 rather than persistent volumes:
def save_checkpoint(model, step):
import torch
checkpoint = {'model': model.state_dict(), 'step': step}
torch.save(checkpoint, f'/tmp/ckpt_{step}.pt')
subprocess.run(['aws', 's3', 'cp',
f'/tmp/ckpt_{step}.pt',
f's3://my-bucket/checkpoints/ckpt_{step}.pt'])
FAQ
When should I choose A100 over H100?
A100 excels for cost-sensitive inference (per-token cost slightly better than H100 due to pricing), fine-tuning models under 20B parameters, and batch processing where interruption tolerance enables spot pricing. H100 is necessary for larger models (70B+) or workloads requiring H100-exclusive features (Tensor Float 32).
How much does A100 spot pricing save versus on-demand?
Spot pricing averages 40-60% discounts. A 40-hour fine-tuning job costs ~$48 on-demand A100 SXM versus $20-30 on spot (depending on demand cycles). For resumable workloads, spot is always preferable.
Can I scale A100 across multiple instances?
Yes, but with latency overhead. Launch 4x A100 instances for data-parallel training. Network communication adds ~200-500μs latency per step versus local NVLink. For models fitting single A100 (under 30B parameters), single-instance training is preferred.
What's the optimal A100 configuration for 70B model inference?
A100 cannot serve 70B models at full precision in 40GB memory. Use 4-bit quantization (reducing model to ~18GB) enabling batch size 2-4 with acceptable throughput (35-50 tokens/second). Cost-per-token: $1.39/hr ÷ 45 tokens/sec = $0.0085/token. For better throughput on large models, upgrade to H100 ($2.69/hr, 40-50 tokens/sec = $0.0150/token) or use 2x A100 clusters with tensor parallelism.
How much does RunPod storage cost and how can I minimize it?
Persistent volumes cost $0.10/GB/day. For 100GB dataset, this adds $3/day or $90/month. Optimal strategy: download datasets to ephemeral instance storage during initialization (free), perform all training from instance storage, upload only final checkpoints to S3 (cheap long-term storage at $0.023/GB/month). Savings: $90/month → $2.30/month for same 100GB dataset.
Sources
- RunPod Pricing: https://www.runpod.io/console/gpu-cloud
- NVIDIA A100 Specifications: https://www.nvidia.com/en-us/data-center/a100/
- vLLM Documentation: https://docs.vllm.ai/
- HuggingFace Transformers Training: https://huggingface.co/docs/transformers/training