A100 CoreWeave: Kubernetes-Native Clusters, Reserved Pricing, and Production AI

Deploybase · January 21, 2025 · GPU Pricing

Contents

A100 CoreWeave: Kubernetes-First GPU Infrastructure for AI

A100 CoreWeave prices 8xA100 clusters at $21.60/hr ($2.70/GPU). The distinctive part: Kubernetes-native orchestration. Automatic pod scheduling, service discovery, and autoscaling are built in. For teams running production AI with containers, this operational value can offset the higher per-GPU cost versus RunPod or Lambda.

This guide covers CoreWeave's A100 pricing, Kubernetes integration, reserved capacity strategies, and production deployment patterns.

A100 CoreWeave Pricing Structure

CoreWeave prices GPU clusters rather than individual instances, with 8xA100 as the standard configuration.

A100 Cluster Pricing and Monthly Analysis

ConfigurationHourlyMonthly (730 hrs)AnnualPer-GPUReserved Savings
8x A100 On-Demand$21.60$15,768$189,216$2.70Baseline
8x A100 Reserved (3-month)$19.44$14,191N/A$2.4310%
8x A100 Reserved (12-month)$13.82$10,088$119,304$1.7336%

12-month reservations save 36% versus on-demand ($13.82/hr vs $21.60/hr = $5.52/hr savings = $40,320/year savings). Custom cluster sizes (2x, 4x A100) cost 15-20% more per GPU due to fixed infrastructure overhead.

Performance Benchmarks on CoreWeave A100 Clusters

ConfigurationTraining Throughput (13B)Inference ThroughputScaling Efficiency
1x A100450 tokens/sec50 tokens/sec100%
2x A100850 tokens/sec95 tokens/sec94%
4x A1001,650 tokens/sec190 tokens/sec92%
8x A1003,200 tokens/sec380 tokens/sec89%

CoreWeave A100 Setup and Kubernetes Integration

Launching CoreWeave A100 Clusters: Step-by-Step

  1. Access CoreWeave console at https://cloud.coreweave.com
  2. Handle to "Kubernetes" section
  3. Click "Deploy Cluster"
  4. Select configuration:
    • GPU Type: A100 SXM
    • Cluster Size: 8xA100 (standard), or custom 2x/4x/16x
    • Region: US-East, US-West, or EU
    • Billing: On-Demand or Reserved (select 12-month for 36% savings)
  5. Configure ingress domain and API access
  6. Review specs and monthly cost estimate
  7. Deploy cluster (10-15 minute provisioning)
  8. Download kubeconfig upon provisioning
  9. Access cluster: kubectl --kubeconfig=coreweave.yaml cluster-info

Container Orchestration for GPU Workloads

CoreWeave exposes GPUs as Kubernetes resources. Deploy training jobs and inference services through declarative manifests:

apiVersion: v1
kind: Pod
metadata:
  name: a100-training
spec:
  containers:
  - name: training
    image: pytorch:2.0-cuda12.2
    resources:
      limits:
        nvidia.com/gpu: 2  # Request 2 GPUs from 8-GPU cluster
    volumeMounts:
    - name: model-storage
      mountPath: /models
volumes:
- name: model-storage
  persistentVolumeClaim:
    claimName: model-pvc

Automatic Pod Scheduling

Kubernetes scheduler automatically distributes pods across available GPUs. Multiple training jobs can coexist on same cluster with resource quotas preventing starvation:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-training
spec:
  hard:
    requests.nvidia.com/gpu: "4"  # Team can use max 4 GPUs
    memory: "64Gi"
    pods: "10"

This multi-tenant isolation enables cost sharing across teams.

Storage Integration

Persistent Volumes for Model and Data

Mount persistent volumes containing model weights and training datasets:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: model-pvc
spec:
  storageClassName: fast-ssd
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 500Gi

CoreWeave's network-attached storage provides <10ms latency, suitable for continuous data access during training.

S3-Compatible Object Storage

For larger datasets, integrate with CoreWeave's S3-compatible storage:

import boto3

s3 = boto3.client('s3',
    endpoint_url='https://storage.coreweave.com',
    aws_access_key_id='YOUR_KEY',
    aws_secret_access_key='YOUR_SECRET'
)

s3.download_file('my-bucket', 'training_data.tar.gz', '/data/train.tar.gz')

A100 Performance and Distributed Training

Single-GPU Throughput

The A100 80GB delivers:

  • FP32: 19.5 TFLOPS
  • BF16/TF32 Tensor: 312 TFLOPS
  • FP16 Tensor: 312 TFLOPS
  • Memory bandwidth: 2.0 TB/s (SXM)

For fine-tuning 13B-parameter models, A100 achieves 600-800 tokens/second effective throughput. Compare H100 performance for larger models and Lambda multi-GPU clusters for distributed setups.

Multi-GPU Training with DDP

Distribute training across 4x A100 cluster for larger models:

import torch.distributed as dist
from torch.nn.parallel import DistributedDataParallel as DDP

dist.init_process_group('nccl')
model = model.to(rank)
model = DDP(model, device_ids=[rank])

loss = model(batch)
loss.backward()
optimizer.step()

Expected throughput: 2,400-3,200 tokens/second for 13B-parameter model on 4x A100.

Reserved Capacity and Cost Optimization

Multi-Year Contracts

Calculate ROI for reserved capacity:

  • 12-month reserved: $13.82/hr = $121,099/year (8xA100)
  • On-demand equivalent: $21.60/hr = $189,216/year
  • Annual savings: $68,117 (36% reduction)

Break-even: 5.5 months of continuous utilization. For production systems running 24/7, 12-month reservations are financially optimal.

Hybrid Capacity Planning

Reserve baseline capacity for predictable load, burst on-demand for peaks:

  • Reserve 1x 8xA100 cluster (baseline): $13.82/hr × 24 × 365 = ~$121,099/year
  • Average burst capacity: 0.5x 8xA100 on-demand: $21.60/hr × 24 × 180 = ~$93,312/year
  • Total: ~$214,411/year

This hybrid approach is cheaper than reserving 1.5x capacity (~$205,648/year) while maintaining flexibility.

Production Workload Patterns

Distributed Training Pipeline

Deploy multi-stage training pipeline leveraging CoreWeave's Kubernetes orchestration:

  1. Data preparation (small instance, 1x A100)
  2. Distributed fine-tuning (4x A100 cluster)
  3. Evaluation (2x A100 cluster)
  4. Model export (1x A100)

Use Kubernetes Jobs and StatefulSets to orchestrate stages:

apiVersion: batch/v1
kind: Job
metadata:
  name: finetune-job
spec:
  template:
    spec:
      containers:
  - name: trainer
        image: training:latest
        resources:
          limits:
            nvidia.com/gpu: 4
      restartPolicy: Never

Inference Serving with Autoscaling

Deploy inference service with automatic scaling based on request queue:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: inference
spec:
  replicas: 2
  selector:
    matchLabels:
      app: inference
  template:
    spec:
      containers:
  - name: vllm
        image: vllm:latest
        resources:
          limits:
            nvidia.com/gpu: 1
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: inference-scaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: inference
  minReplicas: 2
  maxReplicas: 8
  metrics:
  - type: Resource
    resource:
      name: gpu
      target:
        type: Utilization
        averageUtilization: 80

This configuration automatically scales from 2x A100 instances to 8x based on GPU utilization.

Comparing CoreWeave to Single-GPU Providers

Per-GPU Cost Analysis

ProviderA100 CostPer-GPU (8-GPU cluster)
RunPod$1.19/hr$1.19 (single instance)
Lambda$1.48/hr$1.48 (single instance)
Vast.AI$1.00/hr avg$1.00 (single instance)
CoreWeave$21.60/hr$2.70 (8x cluster)

CoreWeave costs 2.3x more per GPU than RunPod single-instance. But for multi-tenant production systems, Kubernetes handles autoscaling and pod scheduling automatically. That saves engineering time compared to manual cluster setup.

See the A100 RunPod guide for single-GPU cost-optimized deployments. Compare Lambda's multi-GPU clusters and AWS p4d pricing for alternative production deployments.

Monitoring and Cost Tracking

CoreWeave dashboard shows GPU utilization and costs per pod, namespace, and team. Track spending per team to find optimization opportunities.

FAQ

When should I choose CoreWeave versus RunPod for A100 training?

CoreWeave excels for production multi-GPU training with Kubernetes orchestration and autoscaling. RunPod is optimal for single-GPU workloads and development. For 4x+ A100 clusters running 24/7, CoreWeave 12-month reserved pricing ($1.73/GPU) becomes competitive with RunPod spot ($0.50-0.80/hr) despite higher upfront commitment.

Can I run multiple teams' workloads on same CoreWeave cluster?

Yes, through Kubernetes namespaces and resource quotas. Each team gets isolated namespace with GPU limits. This enables cost sharing: if 4 teams equally use 1x 8xA100 cluster, cost per team is $5.40/hr versus $21.60/hr solo.

What cost optimization strategies work best for A100 CoreWeave clusters?

(1) Reserve 12-month baseline capacity for 36% savings, (2) Batch inference requests to maximize throughput (batch size 32-64 reduces per-token cost 80%), (3) Share clusters across teams via Kubernetes namespaces (4-5 team sharing reduces per-team cost 60%), (4) Use multi-tier clusters: reserve 1x 8xA100, burst with 2x A100 on-demand during peaks. For organization with 4 teams running 40-hour projects monthly: reserved 12-month ($13.82/hr) + 20 hours on-demand burst ($21.60/hr) = total $4,956/month, versus 4x independent Lambda clusters ($1.48 × 24 × 30 × 4 = $4,249/month). CoreWeave becomes cost-effective with cluster sharing.

How does CoreWeave A100 compare when accounting for infrastructure setup time savings?

CoreWeave's Kubernetes-native deployment eliminates manual infrastructure setup (estimated 40-80 hours engineering time for Kubernetes cluster setup). At $150/hour engineering cost, this equals $6,000-12,000 in setup savings alone. Over 12-month cluster lifetime, CoreWeave's operational value justifies the per-GPU cost premium versus bare RunPod instances for teams with limited infrastructure expertise.

What team size justifies CoreWeave A100 cluster investment versus individual RunPod instances?

For engineering team of N people: CoreWeave cluster cost = $119,412/year (12-month reserved 8xA100). RunPod per-person cost = $1.19/hr × 24 × 365 = $10,427/year. Break-even: 119,412 / 10,427 = 11.4 team members. For teams with 12+ members running continuous A100 workloads, shared CoreWeave cluster is more economical. For teams <10 members or bursty usage patterns, RunPod individual instances are cheaper.

How should I implement cost chargeback for CoreWeave A100 clusters across multiple teams?

(1) Create Kubernetes namespaces per team, (2) Set resource quotas limiting GPU access per team, (3) Monitor usage via Prometheus/Grafana, (4) Calculate cost: (team_gpu_hours / total_gpu_hours) × monthly_cluster_cost. Example: Team A uses 200 GPU-hours, Team B uses 300 GPU-hours, total 500 GPU-hours monthly. Team A cost share: (200/500) × $10,088 = $4,035/month. Team B: $6,053/month. This enables cost transparency and fair allocation.

How does CoreWeave's 2x A100 cluster pricing compare to using 2x single instances?

CoreWeave offers 2x A100 clusters at ~$5.40/hr (custom configurations), versus 2x 8xA100 instances at $21.60/hr. Per-GPU cost for small clusters is higher due to infrastructure overhead amortization. For small training, use single A100s. For clusters exceeding 4 GPUs, CoreWeave is economical.

Sources