H100 on Vultr: Pricing, Specs & How to Rent

Deploybase · February 10, 2026 · GPU Pricing

Contents

H100 Specs Overview

The H100 on Vultr pricing model targets teams running large language model inference, training, and fine-tuning workloads. NVIDIA's H100 is a full-stack tensor processing GPU with 80 GB of HBM3 memory, capable of processing 989 teraflops of FP8 tensor operations. As of March 2026, the H100 remains the workhorse for production AI deployments across cloud providers.

Key technical details include:

  • Memory: 80 GB HBM3
  • Memory bandwidth: 3.35 TB/s
  • Peak FP8 performance: 3,958 TFLOPS
  • Tensor cores: 528
  • NVLink 4 connectivity for multi-GPU setups
  • PCIe Gen 5 support for single-GPU deployments

Vultr H100 Pricing

Vultr offers H100 GPUs through dedicated bare-metal instances. Current pricing starts at approximately $2.30 per GPU hour for H100 configurations (8x H100 bare metal at $18.40/hr, or $2.99/GPU on virtual machine at $23.92/hr for 8x). For comparison, RunPod charges $2.69 per hour for H100 SXM instances, while Lambda Labs lists H100 PCIe at $2.86 per hour.

Vultr's pricing structure includes:

  • H100 8x bare metal: $18.40/hr ($2.30/GPU)
  • H100 8x virtual machine: $23.92/hr ($2.99/GPU)
  • Bulk commitment discounts for monthly/yearly plans
  • No data transfer fees for on-region connections

Performance & Comparison

The H100 outperforms previous-generation A100 GPUs by 3-5x on modern transformer workloads. Teams using the H100 specs guide can estimate 40-60 tokens per second for 7B parameter models and 8-12 tokens per second for 70B parameter models in single-GPU deployments.

Vultr's bare-metal infrastructure provides consistent single-digit millisecond latency between GPUs in multi-node clusters. This makes Vultr's H100 rental option suitable for:

  • Distributed LLM training across 4-8 GPUs
  • Real-time inference at scale with sub-100ms p99 latency
  • Fine-tuning of 30B-70B parameter models
  • Batch processing for video transcoding or image generation

How to Rent H100 on Vultr

  1. Create a Vultr account and verify payment information
  2. Go to Compute > Cloud Compute
  3. Select "Bare Metal" instance type
  4. Choose the H100 GPU option from available configurations
  5. Select region (Vultr offers H100 in 4 global datacenters)
  6. Set root password or SSH key
  7. Choose billing cycle (hourly, monthly, or annual)
  8. Review and confirm order

Vultr deploys instances within 2-3 minutes. SSH access becomes available immediately after provisioning. For containerized workloads, Vultr supports pre-installed NVIDIA Container Runtime, making Docker GPU support straightforward.

Instance management requires basic system administration knowledge. Teams new to bare-metal GPU rental should review Vultr's documentation on NVIDIA driver installation and CUDA toolkit configuration.

FAQ

Is H100 better than A100 for inference? The H100 delivers 3-5x higher throughput on inference workloads, translating to lower latency per token and higher concurrent user capacity. For production deployments handling 50+ simultaneous requests, H100 is typically preferred.

Can I stack multiple H100s on Vultr? Yes. Vultr supports up to 8 H100 GPUs in a single bare-metal instance with NVLink 4 connectivity. This requires custom provisioning through their sales team.

Does Vultr H100 pricing include storage? The stated hourly rate covers compute only. NVMe storage, networking, and bandwidth are billed separately. A typical H100 instance with 500 GB NVMe runs approximately $2.50-3.00 per hour total.

What's the minimum rental period? Hourly billing is available, but Vultr requires a $100 account credit minimum. For longer projects, monthly billing provides 15-20% cost savings.

Explore more GPU rental options and comparisons:

Sources