Contents
- Overview
- Koyeb Pricing Model
- Comprehensive Koyeb Pricing Breakdown
- Available Hardware
- Cost Calculations and Economics
- Serverless GPU Computing Trend
- FAQ
- Related Resources
- Sources
Overview
Koyeb does serverless GPU for containers. Removes infrastructure complexity.
Koyeb Pricing Model
Per-second billing: compute + memory. Works for variable loads. Continuous usage costs more than commitment options.
Compute Pricing Structure
Koyeb charges for GPU instances by the second. Minimum charges apply to prevent billing fragmentation, typically starting at $0.01 per minute.
Pricing tiers vary by GPU type:
- NVIDIA A40 (45GB): $0.80 per GPU-hour
- NVIDIA A100 (80GB): $1.60 per GPU-hour
- NVIDIA H100 (80GB): $2.50 per GPU-hour
- NVIDIA H200 (141GB): $3.00 per GPU-hour
Comparing RunPod GPU pricing H100 SXM at $2.69/hour, Koyeb's H100 at $2.50/hour is actually slightly cheaper while including serverless management capabilities. The H200 at $3.00/hour is one of the most affordable H200 options available.
Memory and Storage
CPU and RAM charges stack on top of GPU costs. Standard configurations include 4 vCPU + 16GB RAM per GPU instance.
Memory pricing runs $0.005 per GB-hour. For standard configurations, memory adds approximately $0.35/hour to GPU charges.
Storage charges apply to persistent volumes at $0.15 per GB-month. Temporary storage in container instances remains free.
Comprehensive Koyeb Pricing Breakdown
Per-Minute Billing Economics
Koyeb's granular per-second billing enables efficient cost optimization. Unlike hourly providers, teams pay only for actual compute seconds:
| Task | Duration | GPU Hours | Hourly Cost | Per-Minute Cost | Savings |
|---|---|---|---|---|---|
| Model fine-tuning | 2 hours | 2 | $6.90 | $4.14 | 40% |
| Batch processing 1000 items | 1.5 hours | 1.5 | $5.18 | $3.10 | 40% |
| Debugging inference | 45 minutes | 0.75 | $2.59 | $1.55 | 40% |
| Quick experiment | 10 minutes | 0.167 | $0.58 | $0.09 | 85% |
Short workloads benefit disproportionately from per-minute billing. A 10-minute debugging session costs $0.09 on Koyeb versus $0.58+ on hourly providers (87% savings).
Tier-by-Tier Breakdown
Complete pricing across Koyeb's GPU tiers:
| GPU Type | Base Rate | CPU/RAM | Storage | Total Hourly | Best For |
|---|---|---|---|---|---|
| A40 45GB | $0.80 | $0.35 | $0.05 | $1.20 | Budget inference |
| A100 80GB | $1.60 | $0.35 | $0.05 | $2.00 | Standard inference |
| H100 80GB | $2.50 | $0.35 | $0.05 | $2.90 | Performance inference |
| H200 141GB | $3.00 | $0.35 | $0.05 | $3.40 | Very large model inference |
Auto-Scaling Pricing Impact
Koyeb's auto-scaling dramatically reduces costs for variable-load applications:
| Load Pattern | Peak Instances | Average Instances | Cost Reduction |
|---|---|---|---|
| Constant (flat) | 1 | 1 | 0% baseline |
| 2× peak variation | 2 | 1.3 | 35% vs constant |
| 5× peak variation | 5 | 1.5 | 70% vs constant |
| Bursty (10× peaks) | 10 | 1.2 | 88% vs constant |
Applications with significant traffic variation benefit most from Koyeb's pricing model. A chatbot with 50 QPS baseline and 500 QPS peaks scales from 0.5 to 5 instances, saving 88% versus over-provisioning for peak constantly.
Available Hardware
A40 GPU Tier
NVIDIA A40 GPUs optimize for inference and real-time rendering. Lower power consumption enables higher density data centers, reducing costs.
Per-GPU pricing of $0.80/hour makes A40 compelling for inference workloads tolerating slightly lower performance than A100. Bandwidth-intensive applications may encounter limitations due to 384GB/sec vs. A100's 1935GB/sec memory bandwidth.
A100 Balanced Tier
A100 instances represent the most popular Koyeb configuration. At $1.60/hour per GPU, pricing aligns competitively with specialist GPU providers.
Koyeb's containerized approach suits stateless inference workloads and batch processing. Stateful training scenarios benefit from lower-level infrastructure like Lambda GPU pricing offering H100 SXM at $3.78/hour.
H100 Performance Tier
H100 GPUs provide maximum performance for demanding workloads. Pricing at $2.50/hour is competitive with dedicated IaaS providers — actually cheaper than RunPod's H100 SXM at $2.69/hour — while including Koyeb's serverless management capabilities.
H200 Ultra-Memory Tier
H200 GPUs with 141GB HBM3e memory enable running very large models on a single GPU. Koyeb's H200 at $3.00/hour is one of the most affordable H200 options on the market, making it attractive for teams that need the extreme VRAM capacity.
Total cost analysis requires considering operational overhead. Koyeb eliminates infrastructure management, cluster configuration, and networking setup required by traditional cloud GPU providers.
Cost Calculations and Economics
Baseline Inference Workload
Serving predictions from a single A100 instance continuously for one month:
- A100 GPU cost: $1.60/hour
- CPU/RAM overhead: $0.35/hour
- Storage (100GB persistent): $0.05/hour
- Total hourly cost: $2.00/hour
- Monthly cost: 2.00 × 24 × 30 = $1,440
Koyeb's per-second billing enables cost optimization. Unlike hourly providers, unused capacity doesn't generate charges. Scaling down during low-traffic periods saves proportional costs.
Comparing CoreWeave GPU pricing 8×H100 cluster at $49.24/hour (that's $6.16/GPU/hour), Koyeb's A100 at $1.60 provides 74% cost reduction for suitable workloads. However, CoreWeave's networking advantages justify higher costs for coordinated multi-GPU work.
Batch Processing Project
Processing 10,000 images through a computer vision model:
- Average processing time: 0.5 seconds per image
- Total GPU time: 5,000 seconds or 1.39 hours
- H100 cost: 1.39 × $3.45 = $4.79
- CPU/RAM cost: 1.39 × $0.35 = $0.49
- Total project cost: $5.28
This tight coupling between work performed and costs paid makes Koyeb extremely cost-efficient for batch workloads. Compare to hourly providers where 1.39 hours rounds to 2 full hourly charges, approximately $8-10 total. Koyeb saves 40-50% on short batch jobs.
Real-Time API Endpoint Economics
Deploying inference API with 10 QPS average load, 50 QPS peak:
- Single A100 throughput: 100 QPS @ 50ms latency
- Required instances: 1 A100 instance
- Monthly cost at constant load: $1,800 (from baseline)
Adding API gateway and load balancing: approximately $50-100 additional monthly, bringing total to $1,850-1,900.
Comparing to Replicate GPU pricing at $0.001 per second for A40:
- For Koyeb: 1,000,000 queries per month × 0.05 seconds average = 50,000 seconds = $172.50 (assuming per-second billing matches)
- Actual Koyeb: Still $1,800 for dedicated instance
The breakeven: At approximately 150,000 monthly queries, dedicated Koyeb infrastructure becomes cost-effective. Lower volumes benefit from Replicate's per-request model.
Auto-Scaling Scenarios
Koyeb's automatic scaling enables cost optimization for variable-load endpoints:
Average daily profile: 100 QPS (off-peak) to 1000 QPS (peak), 10:00-18:00 UTC
- Off-peak (8 hours): 100 QPS = 0.5 required instances × $1.80/instance/day = $0.90
- Peak (10 hours): 1000 QPS = 5 required instances × $1.80/instance/day = $9.00
- Low-load (6 hours): 10 QPS = 0.05 instances × $1.80/instance/day = $0.09
- Monthly cost: ($0.90 + $9.00 + $0.09) × 30 = $297
Compared to over-provisioning for peak (5 instances continuously): 5 × $43.20 × 30 = $6,480
Koyeb saves 95% through automatic scaling, though practical configurations include safety margins increasing costs to $400-600 monthly.
Serverless GPU Computing Trend
Managed Infrastructure Appeal
Koyeb represents the serverless GPU computing trend gaining adoption in 2026. Teams increasingly value eliminating infrastructure management overhead over marginal per-unit cost optimization.
Comparing bare infrastructure (RunPod at $2.69/hour H100) to managed services (Koyeb at $3.45/hour H100):
- Bare infrastructure: Manage scaling, monitoring, failover
- Koyeb: Deploy containers, let platform handle operational complexity
For teams with limited DevOps resources, Koyeb's automation premium (28% cost increase) enables focus on model development.
Container-Native Deployment
Koyeb's full Docker support eliminates vendor lock-in. Models trained anywhere deploy to Koyeb unchanged. This openness contrasts with proprietary APIs like Replicate or OpenAI API pricing.
Teams valuing flexibility and avoiding proprietary frameworks prefer container-based solutions for long-term viability.
Scaling Without Engineering
Koyeb's automatic scaling eliminates load-balancing engineering. Deploying Llama 2 7B inference requires:
- Write inference container
- Deploy to Koyeb
- Set auto-scaling parameters
- Done
Traditional infrastructure requires network load balancers, monitoring systems, and scaling orchestration. Koyeb handles this automatically.
FAQ
Q: Does Koyeb support custom CUDA kernels? A: Yes. Koyeb supports arbitrary Docker containers with CUDA toolkit. Custom kernel compilation occurs during container build.
Q: What's the maximum GPU count per deployment? A: Koyeb supports up to 8 GPUs per deployment instance. Larger requirements require multiple instances and manual load balancing.
Q: Can I reserve capacity in advance? A: Koyeb does not offer reserved instances. All GPUs run on on-demand per-second billing.
Q: How does Koyeb handle auto-scaling? A: Koyeb automatically scales instances based on request queue depth and CPU utilization. Scaling decisions occur every 10-30 seconds.
Q: Is there data transfer cost between Koyeb instances? A: Internal Koyeb network traffic costs $0.01 per GB. Cross-region traffic incurs standard egress charges of $0.10-0.50 per GB.
Related Resources
- GPU Pricing Comparison
- Lambda GPU Pricing
- NVIDIA A100 Price Guide
- NVIDIA H100 Price Guide
- LLM API Pricing
Sources
- Koyeb official pricing documentation (as of March 2026)
- Container registry and GPU deployment specifications
- Serverless platform cost analysis
- DeployBase infrastructure benchmarking