Koyeb GPU Cloud Pricing: Complete Guide vs Hourly Rates for Every GPU

Deploybase · January 22, 2025 · GPU Pricing

Contents

Overview

Koyeb does serverless GPU for containers. Removes infrastructure complexity.

Koyeb Pricing Model

Per-second billing: compute + memory. Works for variable loads. Continuous usage costs more than commitment options.

Compute Pricing Structure

Koyeb charges for GPU instances by the second. Minimum charges apply to prevent billing fragmentation, typically starting at $0.01 per minute.

Pricing tiers vary by GPU type:

  • NVIDIA A40 (45GB): $0.80 per GPU-hour
  • NVIDIA A100 (80GB): $1.60 per GPU-hour
  • NVIDIA H100 (80GB): $2.50 per GPU-hour
  • NVIDIA H200 (141GB): $3.00 per GPU-hour

Comparing RunPod GPU pricing H100 SXM at $2.69/hour, Koyeb's H100 at $2.50/hour is actually slightly cheaper while including serverless management capabilities. The H200 at $3.00/hour is one of the most affordable H200 options available.

Memory and Storage

CPU and RAM charges stack on top of GPU costs. Standard configurations include 4 vCPU + 16GB RAM per GPU instance.

Memory pricing runs $0.005 per GB-hour. For standard configurations, memory adds approximately $0.35/hour to GPU charges.

Storage charges apply to persistent volumes at $0.15 per GB-month. Temporary storage in container instances remains free.

Comprehensive Koyeb Pricing Breakdown

Per-Minute Billing Economics

Koyeb's granular per-second billing enables efficient cost optimization. Unlike hourly providers, teams pay only for actual compute seconds:

TaskDurationGPU HoursHourly CostPer-Minute CostSavings
Model fine-tuning2 hours2$6.90$4.1440%
Batch processing 1000 items1.5 hours1.5$5.18$3.1040%
Debugging inference45 minutes0.75$2.59$1.5540%
Quick experiment10 minutes0.167$0.58$0.0985%

Short workloads benefit disproportionately from per-minute billing. A 10-minute debugging session costs $0.09 on Koyeb versus $0.58+ on hourly providers (87% savings).

Tier-by-Tier Breakdown

Complete pricing across Koyeb's GPU tiers:

GPU TypeBase RateCPU/RAMStorageTotal HourlyBest For
A40 45GB$0.80$0.35$0.05$1.20Budget inference
A100 80GB$1.60$0.35$0.05$2.00Standard inference
H100 80GB$2.50$0.35$0.05$2.90Performance inference
H200 141GB$3.00$0.35$0.05$3.40Very large model inference

Auto-Scaling Pricing Impact

Koyeb's auto-scaling dramatically reduces costs for variable-load applications:

Load PatternPeak InstancesAverage InstancesCost Reduction
Constant (flat)110% baseline
2× peak variation21.335% vs constant
5× peak variation51.570% vs constant
Bursty (10× peaks)101.288% vs constant

Applications with significant traffic variation benefit most from Koyeb's pricing model. A chatbot with 50 QPS baseline and 500 QPS peaks scales from 0.5 to 5 instances, saving 88% versus over-provisioning for peak constantly.

Available Hardware

A40 GPU Tier

NVIDIA A40 GPUs optimize for inference and real-time rendering. Lower power consumption enables higher density data centers, reducing costs.

Per-GPU pricing of $0.80/hour makes A40 compelling for inference workloads tolerating slightly lower performance than A100. Bandwidth-intensive applications may encounter limitations due to 384GB/sec vs. A100's 1935GB/sec memory bandwidth.

A100 Balanced Tier

A100 instances represent the most popular Koyeb configuration. At $1.60/hour per GPU, pricing aligns competitively with specialist GPU providers.

Koyeb's containerized approach suits stateless inference workloads and batch processing. Stateful training scenarios benefit from lower-level infrastructure like Lambda GPU pricing offering H100 SXM at $3.78/hour.

H100 Performance Tier

H100 GPUs provide maximum performance for demanding workloads. Pricing at $2.50/hour is competitive with dedicated IaaS providers — actually cheaper than RunPod's H100 SXM at $2.69/hour — while including Koyeb's serverless management capabilities.

H200 Ultra-Memory Tier

H200 GPUs with 141GB HBM3e memory enable running very large models on a single GPU. Koyeb's H200 at $3.00/hour is one of the most affordable H200 options on the market, making it attractive for teams that need the extreme VRAM capacity.

Total cost analysis requires considering operational overhead. Koyeb eliminates infrastructure management, cluster configuration, and networking setup required by traditional cloud GPU providers.

Cost Calculations and Economics

Baseline Inference Workload

Serving predictions from a single A100 instance continuously for one month:

  • A100 GPU cost: $1.60/hour
  • CPU/RAM overhead: $0.35/hour
  • Storage (100GB persistent): $0.05/hour
  • Total hourly cost: $2.00/hour
  • Monthly cost: 2.00 × 24 × 30 = $1,440

Koyeb's per-second billing enables cost optimization. Unlike hourly providers, unused capacity doesn't generate charges. Scaling down during low-traffic periods saves proportional costs.

Comparing CoreWeave GPU pricing 8×H100 cluster at $49.24/hour (that's $6.16/GPU/hour), Koyeb's A100 at $1.60 provides 74% cost reduction for suitable workloads. However, CoreWeave's networking advantages justify higher costs for coordinated multi-GPU work.

Batch Processing Project

Processing 10,000 images through a computer vision model:

  • Average processing time: 0.5 seconds per image
  • Total GPU time: 5,000 seconds or 1.39 hours
  • H100 cost: 1.39 × $3.45 = $4.79
  • CPU/RAM cost: 1.39 × $0.35 = $0.49
  • Total project cost: $5.28

This tight coupling between work performed and costs paid makes Koyeb extremely cost-efficient for batch workloads. Compare to hourly providers where 1.39 hours rounds to 2 full hourly charges, approximately $8-10 total. Koyeb saves 40-50% on short batch jobs.

Real-Time API Endpoint Economics

Deploying inference API with 10 QPS average load, 50 QPS peak:

  • Single A100 throughput: 100 QPS @ 50ms latency
  • Required instances: 1 A100 instance
  • Monthly cost at constant load: $1,800 (from baseline)

Adding API gateway and load balancing: approximately $50-100 additional monthly, bringing total to $1,850-1,900.

Comparing to Replicate GPU pricing at $0.001 per second for A40:

  • For Koyeb: 1,000,000 queries per month × 0.05 seconds average = 50,000 seconds = $172.50 (assuming per-second billing matches)
  • Actual Koyeb: Still $1,800 for dedicated instance

The breakeven: At approximately 150,000 monthly queries, dedicated Koyeb infrastructure becomes cost-effective. Lower volumes benefit from Replicate's per-request model.

Auto-Scaling Scenarios

Koyeb's automatic scaling enables cost optimization for variable-load endpoints:

Average daily profile: 100 QPS (off-peak) to 1000 QPS (peak), 10:00-18:00 UTC

  • Off-peak (8 hours): 100 QPS = 0.5 required instances × $1.80/instance/day = $0.90
  • Peak (10 hours): 1000 QPS = 5 required instances × $1.80/instance/day = $9.00
  • Low-load (6 hours): 10 QPS = 0.05 instances × $1.80/instance/day = $0.09
  • Monthly cost: ($0.90 + $9.00 + $0.09) × 30 = $297

Compared to over-provisioning for peak (5 instances continuously): 5 × $43.20 × 30 = $6,480

Koyeb saves 95% through automatic scaling, though practical configurations include safety margins increasing costs to $400-600 monthly.

Serverless GPU Computing Trend

Managed Infrastructure Appeal

Koyeb represents the serverless GPU computing trend gaining adoption in 2026. Teams increasingly value eliminating infrastructure management overhead over marginal per-unit cost optimization.

Comparing bare infrastructure (RunPod at $2.69/hour H100) to managed services (Koyeb at $3.45/hour H100):

  • Bare infrastructure: Manage scaling, monitoring, failover
  • Koyeb: Deploy containers, let platform handle operational complexity

For teams with limited DevOps resources, Koyeb's automation premium (28% cost increase) enables focus on model development.

Container-Native Deployment

Koyeb's full Docker support eliminates vendor lock-in. Models trained anywhere deploy to Koyeb unchanged. This openness contrasts with proprietary APIs like Replicate or OpenAI API pricing.

Teams valuing flexibility and avoiding proprietary frameworks prefer container-based solutions for long-term viability.

Scaling Without Engineering

Koyeb's automatic scaling eliminates load-balancing engineering. Deploying Llama 2 7B inference requires:

  1. Write inference container
  2. Deploy to Koyeb
  3. Set auto-scaling parameters
  4. Done

Traditional infrastructure requires network load balancers, monitoring systems, and scaling orchestration. Koyeb handles this automatically.

FAQ

Q: Does Koyeb support custom CUDA kernels? A: Yes. Koyeb supports arbitrary Docker containers with CUDA toolkit. Custom kernel compilation occurs during container build.

Q: What's the maximum GPU count per deployment? A: Koyeb supports up to 8 GPUs per deployment instance. Larger requirements require multiple instances and manual load balancing.

Q: Can I reserve capacity in advance? A: Koyeb does not offer reserved instances. All GPUs run on on-demand per-second billing.

Q: How does Koyeb handle auto-scaling? A: Koyeb automatically scales instances based on request queue depth and CPU utilization. Scaling decisions occur every 10-30 seconds.

Q: Is there data transfer cost between Koyeb instances? A: Internal Koyeb network traffic costs $0.01 per GB. Cross-region traffic incurs standard egress charges of $0.10-0.50 per GB.

Sources

  • Koyeb official pricing documentation (as of March 2026)
  • Container registry and GPU deployment specifications
  • Serverless platform cost analysis
  • DeployBase infrastructure benchmarking