A100 RunPod: Cost-Effective GPU Pricing, Templates, and Spot Savings

Deploybase · February 17, 2025 · GPU Pricing

Contents

A100 RunPod: Professional Compute at Entry-Level Pricing

A100 RunPod pricing starts at $1.19 per hour for PCIe configurations and $1.39 per hour for high-bandwidth SXM variants. The A100, released in 2020, remains the most cost-efficient GPU for production workloads, particularly for teams prioritizing lower compute costs over latest performance. RunPod's flexible pricing model and spot market make A100 instances even more economical.

This guide covers RunPod A100 pricing, template options, spot market strategies, and when A100 exceeds H100 value.

A100 Pricing and Configurations

RunPod's A100 pricing undercuts all competing providers while delivering exceptional value for most production workloads.

A100 Pricing Tiers and Monthly Analysis

ConfigurationHourlyMonthly (730 hrs)AnnualSpot SavingsPer-Token (50 tokens/sec)
A100 PCIe$1.19$869$10,42740-60%$0.0066
A100 SXM$1.39$1,014$12,17240-55%$0.0077
A100 Spot (avg 50%)$0.60$438$5,256N/A$0.0033
A100 Spot (avg 45%)$0.76$555$6,654N/A$0.0042

Spot pricing reduces A100 costs by 40-60% during off-peak hours, making batch training extremely economical. A 40-hour fine-tuning job on A100 Spot costs approximately $25-35 (avg $30), versus $47.60 on-demand:a 37% savings for resumable workloads.

Provider Cost Comparison at Standard Rates

ProviderA100 CostMonthlyPer-Token
RunPod$1.19$869$0.0066
Lambda$1.48$1,080$0.0082
Vast.AI (avg)$1.00$730$0.0055
CoreWeave (8x)$2.70N/A (cluster)N/A
AWS p4d$4.10N/A (8x cluster)N/A

RunPod provides best single-GPU A100 pricing among dedicated providers, with Vast.AI peer-to-peer offering lower cost at higher risk.

Per-GPU Cost Analysis and Cost Optimization

Running inference on single A100 costs $0.017/hour per 1K tokens (assuming 50 tokens/second throughput, 75% utilization). This compares favorably to managed inference services charging $0.05-0.15 per 1K tokens.

Detailed cost analysis:

  • A100 hourly cost: $1.19/hr
  • Inference throughput: 50 tokens/second sustained
  • Cost per token: $1.19 / (50 × 3600) = $0.00000663/token
  • Cost per 1M tokens: $6.63

This is 7-22x cheaper than managed services. For batch processing, costs drop further with request batching.

Performance Benchmarks

A100 Inference Throughput

A100 40GB HBM2 delivers predictable throughput across configurations:

ModelBatch SizeThroughputLatency (TTFT)
7B Mistral155-65 tokens/sec30-50ms
13B Model135-45 tokens/sec40-70ms
30B Model115-25 tokens/sec80-150ms
7B Mistral8150-200 tokens/sec100-200ms

Training Speed

Fine-tuning throughput with standard configurations:

TaskConfigurationThroughput
7B LoRA (16-bit)A100 SXM380-420 tokens/sec
13B Full (8-bit)A100 SXM200-250 tokens/sec
30B LoRA (4-bit)A100 PCIe150-200 tokens/sec

RunPod A100 Templates and Setup

RunPod provides pre-configured environment templates optimizing A100 deployment.

PyTorch Template

The official PyTorch template includes:

  • CUDA 12.2, CuDNN 8.9
  • PyTorch 2.1 with torch.compile support
  • TensorFlow 2.13 (optional)
  • Jupyter lab pre-configured

Launch this template in 90 seconds versus 15+ minutes for custom environment setup.

vLLM Inference Template

Purpose-built for LLM serving:

  • vLLM server pre-configured
  • Flash Attention 2 enabled
  • Token streaming support
  • OpenAI-compatible API endpoint
python -m vllm.entrypoints.openai.api_server \
  --model mistralai/Mistral-7B-Instruct-v0.1 \
  --tensor-parallel-size 1 \
  --gpu-memory-utilization 0.9

Custom Template Creation

Build custom templates from Docker images for specialized workloads:

FROM nvidia/cuda:12.2.0-devel-ubuntu22.04
RUN apt-get update && apt-get install -y python3-pip
RUN pip install torch torchvision torchaudio
RUN pip install huggingface_hub transformers

Push to Docker Hub, then select "Custom Image" in RunPod console.

A100 Performance Characteristics

Throughput vs H100

A100 (40GB or 80GB HBM2e) delivers 312 TFLOPS BF16 tensor core performance (19.5 TFLOPS FP32) versus H100's 1,979 TFLOPS BF16 - approximately 16% of H100's peak tensor throughput. However, for most inference workloads, memory bandwidth (2.04 TB/s) rather than compute limits performance.

When A100 Matches or Exceeds H100

  • Memory-bound operations: Many inference tasks (especially large batch sizes) hit bandwidth limits before compute limits. A100 and H100 perform identically.
  • Quantized inference: 8-bit or 4-bit models run identically on A100 and H100; compute headroom vanishes.
  • Cost-per-token analysis: At $1.39/hr versus H100 $2.69/hr, A100 provides superior cost-per-token for inference despite lower peak throughput.

For fine-tuning models under 20B parameters, A100 SXM at $1.39/hr outperforms H100 on value.

Spot Pricing and Batch Processing

Off-Peak Bidding Strategy

RunPod's spot market displays historical pricing. Bid at 50-60% of on-demand rates during off-peak windows (typically 1-6 AM UTC):

  1. View 7-day spot price history for A100 PCIe (usually $0.48-0.71/hr)
  2. Set bid at 55% of typical price ($0.65/hr for $1.19 baseline)
  3. Launch during low-demand windows
  4. Accept 1-2 hour wait for instance availability

This reduces training costs by 50-60% with minimal disruption to resumable workloads.

Checkpointing for Spot Reliability

Enable frequent checkpoints (every 500 steps) to tolerate interruptions:

checkpoint_dir = '/workspace/checkpoints'
for step in range(total_steps):
    loss = train_step(batch)

    if step % 500 == 0:
        model.save_checkpoint(f'{checkpoint_dir}/step_{step}.pt')
        # Upload to persistent storage
        subprocess.run(['aws', 's3', 'cp',
                       f'{checkpoint_dir}/step_{step}.pt',
                       's3://my-bucket/checkpoints/'])

Spot interruptions cost effective training time, not money, since terminated instances don't incur charges.

Detailed Setup and Running Workloads on A100

Launching A100 Instances: Step-by-Step Guide

  1. Access RunPod console at https://www.runpod.io/console/gpu-cloud
  2. Click "GPU Cloud" in sidebar
  3. Filter by GPU: Select "A100"
  4. Choose configuration:
    • A100 PCIe ($1.19/hr): Best value, adequate for most single-GPU work
    • A100 SXM ($1.39/hr): Select if requiring NVLink for multi-GPU distribution
  5. Select template: PyTorch 2.0 (recommended), TensorFlow, or JAX
  6. Configure:
    • vCPU: 8-16 minimum (16+ for large batch processing)
    • Memory: 20-32GB RAM
    • Storage: 50GB minimum (200GB+ for training data)
  7. Select region (primary: US-East, US-West)
  8. Click "Deploy" and wait 2-5 minutes

LLM Fine-Tuning

The A100's 40GB HBM2 memory supports full-parameter fine-tuning of 7B-13B parameter models:

from transformers import AutoModelForCausalLM, Trainer
from peft import get_peft_model, LoraConfig

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B")
lora_config = LoraConfig(r=8, lora_alpha=16)
model = get_peft_model(model, lora_config)

trainer = Trainer(
    model=model,
    args=TrainingArguments(
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4
    ),
    train_dataset=train_data
)
trainer.train()

Expected fine-tuning throughput: 450-600 tokens/second on A100 SXM.

Multi-GPU A100 Clusters

While RunPod doesn't directly offer multi-A100 clusters, launch 2-4 separate A100 instances and use distributed training frameworks. Network communication between instances adds latency compared to local NVLink, but remains viable for data parallelism.

Cost Optimization Strategies

Spot Market Strategy for A100

RunPod A100 spot pricing averages 45-55% of on-demand rate during off-peak hours. For batch training:

  1. View 7-day spot price history (typically $0.48-0.71/hr for PCIe)
  2. Set maximum bid at 60% of on-demand ($0.71/hr for $1.19 baseline)
  3. Launch during off-peak (2-6 AM UTC) for 70%+ fill rate
  4. Enable hourly checkpointing to tolerate interruptions

Monthly cost for 200-hour training job:

  • On-demand: $1.19 × 200 = $238
  • Spot (55% average): $0.65 × 200 = $130
  • Savings: $108 per 200-hour project

Batch Inference Optimization

Consolidate inference requests to maximize A100 throughput and minimize per-token cost:

from vllm import LLM

llm = LLM(
    model="mistralai/Mistral-7B",
    max_num_batched_tokens=8192,  # Batch up to 8K tokens per request
    gpu_memory_utilization=0.9
)

responses = llm.generate(
    [prompt_1, prompt_2, prompt_3, prompt_4],  # Batch multiple requests
    sampling_params=SamplingParams(temperature=0.7, top_p=0.95)
)

Multi-Model Serving with LoRA Adapters

Run multiple fine-tuned models from single A100 using LoRA adapter swapping:

from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B")

adapters = {
    "customer_a": PeftModel.from_pretrained(base_model, "lora/customer-a"),
    "customer_b": PeftModel.from_pretrained(base_model, "lora/customer-b"),
}

outputs_a = adapters["customer_a"].generate(**inputs)
outputs_b = adapters["customer_b"].generate(**inputs)

This enables serving 3-4 different customer models from single instance, reducing per-model cost 70-75%.

Cost Comparison to Other Providers

Single-GPU A100 Pricing

ProviderA100 PCIeA100 SXM
RunPod$1.19$1.39
Lambda$1.48N/A
Vast.AI$0.80-1.50$0.80-1.50
AWS$1.95$2.19

RunPod and Lambda provide competitive pricing with guaranteed capacity. Vast.ai's peer-to-peer marketplace offers lower spot pricing but with provider variability. AWS A100 pricing includes CPU and managed services premium.

See the A100 Lambda guide for reserved pricing options. Check CoreWeave's A100 clusters for multi-GPU training setups.

Storage and Data Management

Persistent Volumes

RunPod's network-attached storage costs $0.10/GB/day. For a 100GB dataset, this costs ~$3/day. Download datasets to instance storage on launch to avoid storage charges:

#!/bin/bash
aws s3 cp s3://my-bucket/dataset.tar.gz /tmp/
tar -xzf /tmp/dataset.tar.gz -C /workspace/data/
rm /tmp/dataset.tar.gz  # Free up space

Model Checkpointing

Save model checkpoints to S3 rather than persistent volumes:

def save_checkpoint(model, step):
    import torch
    checkpoint = {'model': model.state_dict(), 'step': step}
    torch.save(checkpoint, f'/tmp/ckpt_{step}.pt')
    subprocess.run(['aws', 's3', 'cp',
                   f'/tmp/ckpt_{step}.pt',
                   f's3://my-bucket/checkpoints/ckpt_{step}.pt'])

FAQ

When should I choose A100 over H100?

A100 excels for cost-sensitive inference (per-token cost slightly better than H100 due to pricing), fine-tuning models under 20B parameters, and batch processing where interruption tolerance enables spot pricing. H100 is necessary for larger models (70B+) or workloads requiring H100-exclusive features (Tensor Float 32).

How much does A100 spot pricing save versus on-demand?

Spot pricing averages 40-60% discounts. A 40-hour fine-tuning job costs ~$48 on-demand A100 SXM versus $20-30 on spot (depending on demand cycles). For resumable workloads, spot is always preferable.

Can I scale A100 across multiple instances?

Yes, but with latency overhead. Launch 4x A100 instances for data-parallel training. Network communication adds ~200-500μs latency per step versus local NVLink. For models fitting single A100 (under 30B parameters), single-instance training is preferred.

What's the optimal A100 configuration for 70B model inference?

A100 cannot serve 70B models at full precision in 40GB memory. Use 4-bit quantization (reducing model to ~18GB) enabling batch size 2-4 with acceptable throughput (35-50 tokens/second). Cost-per-token: $1.39/hr ÷ 45 tokens/sec = $0.0085/token. For better throughput on large models, upgrade to H100 ($2.69/hr, 40-50 tokens/sec = $0.0150/token) or use 2x A100 clusters with tensor parallelism.

How much does RunPod storage cost and how can I minimize it?

Persistent volumes cost $0.10/GB/day. For 100GB dataset, this adds $3/day or $90/month. Optimal strategy: download datasets to ephemeral instance storage during initialization (free), perform all training from instance storage, upload only final checkpoints to S3 (cheap long-term storage at $0.023/GB/month). Savings: $90/month → $2.30/month for same 100GB dataset.

Sources