Renting an NVIDIA H100 GPU in 2026 costs between $1.25 and $14.90 per GPU-hour depending on the provider, hardware variant, and billing model. The spread is real and meaningful. Choosing the wrong provider for a sustained training workload can double your monthly compute bill without any change in output.
- The market average across 42 providers is $3.61/hr on-demand. Specialized GPU cloud providers sit well below that average. GMI Cloud offers H100 PCIe at $2.00/hr and H100 SXM at $2.40/hr, which is 40 to 60 percent below hyperscaler on-demand rates.
- PCIe and SXM are meaningfully different products, not packaging variations. SXM delivers 3.35 TB/s memory bandwidth versus 2.0 TB/s on PCIe, and NVLink 4.0 at 900 GB/s versus PCIe Gen5 at 128 GB/s. For multi-GPU training, SXM is required. For inference and single-GPU fine-tuning, PCIe performs well at lower cost.
- Spot instances reduce costs by 60 to 90 percent but carry interruption risk. Reserved instances cut 20 to 40 percent off on-demand rates with 1 to 12 month commitments. On-demand is the right default for most teams starting out.
- Hidden costs are real. Hyperscaler egress fees run $0.08 to $0.12 per GB and can add 10 to 20 percent to monthly bills. Hypervisor overhead on virtual instances reduces usable GPU performance by 10 to 15 percent versus bare metal, which inflates the effective cost per unit of work.
- Real workload benchmarks: A LoRA fine-tune of Llama 3.1 70B on 4x H100s takes roughly 15 hours and costs $120 to $180 on GMI Cloud. A full fine-tune on 8x H100s over 24 to 48 hours runs $250 to $510. Training from scratch at 70B scale costs $10,000 to $50,000 for 300 to 1,000 GPU-hours.
- Buying vs renting: A single H100 PCIe costs $25,000 to $30,000 to purchase. At $2.00/hr cloud rental, break-even requires 12,500 to 15,000 hours of continuous use, roughly 17 to 20 months running 24/7, before infrastructure, power, and maintenance costs are factored in.
Why H100 Pricing Varies So Much
The NVIDIA H100 is the most widely deployed GPU for production AI workloads. It ships in two form factors and across dozens of cloud providers, each with different infrastructure costs, business models, and billing approaches. The result is a market where the same 80GB GPU with identical specifications costs $2.00/hr on one platform and $12.29/hr on another.
Three factors explain most of the spread.
Provider type. Hyperscalers (AWS, Azure, GCP) run GPU compute as one service among hundreds, carry enterprise overhead, and price accordingly. Specialized GPU cloud providers build exclusively for AI workloads, operate leaner infrastructure, and pass the savings to customers. This structural difference accounts for most of the 2 to 3x gap between hyperscalers and specialized providers on equivalent hardware.
Billing model. On-demand instances carry no commitment and maximum flexibility. Spot instances are interruptible but dramatically cheaper. Reserved capacity requires upfront commitments in exchange for sustained discounts. The same physical GPU can cost $3.90/hr on-demand, $1.95/hr reserved, or $0.80/hr as a spot instance on the same platform.
Form factor. The H100 SXM and H100 PCIe are built on the same Hopper architecture but perform meaningfully differently and price accordingly. The SXM variant commands a $0.40 to $0.50 per hour premium over PCIe across most providers.
H100 PCIe vs SXM: Which One Do You Actually Need?
This decision matters more than most pricing guides acknowledge. Choosing the wrong form factor means either overpaying for capabilities your workload cannot use, or hitting a hard performance ceiling during training.
| Spec | H100 SXM | H100 PCIe |
|---|---|---|
| VRAM | 80 GB HBM3 | 80 GB HBM2e |
| Memory Bandwidth | 3.35 TB/s | 2.0 TB/s |
| GPU-to-GPU Interconnect | NVLink 4.0, 900 GB/s | PCIe Gen5, 128 GB/s |
| Power Draw | 700W | 350W |
| Best For | Multi-GPU distributed training | Inference, fine-tuning, single-GPU work |
Choose SXM for distributed training across 4 or more GPUs where GPU-to-GPU communication is the bottleneck. Large language model pretraining, multimodal model training, and any workload that scales across nodes with NVLink sees 30 to 40 percent faster throughput on SXM versus PCIe. PCIe-based clusters suffer severe communication bottlenecks beyond 2 to 4 GPUs for communication-intensive workloads.
Choose PCIe for inference serving, LoRA or QLoRA fine-tuning on single or loosely coupled GPUs, batch processing, and any workload that does not require high-bandwidth inter-GPU communication. You get the same 80GB VRAM and the same Transformer Engine at meaningfully lower cost. For production inference, the difference in throughput between SXM and PCIe is small enough that the 20 percent price premium rarely justifies itself.
On GMI Cloud, H100 PCIe starts at $2.00/hr and H100 SXM at $2.40/hr, both on bare metal with no hypervisor overhead.
Provider Pricing: What You Actually Pay Per Hour in 2026
| Provider | H100 PCIe On-Demand | H100 SXM On-Demand | Billing |
|---|---|---|---|
| GMI Cloud | $2.00/hr | $2.40/hr | Per-minute |
| RunPod (Community) | ~$1.99/hr | ~$2.39/hr | Per-minute |
| Vast.ai | From $1.87/hr | Variable | Per-minute |
| Lambda Labs | $2.49/hr | $2.89/hr | Per-minute |
| Hyperstack | $2.40/hr | $2.40/hr | Hourly |
| AWS (P5 instances) | Not separate | ~$3.90/hr | Per-second |
| Google Cloud (A3) | Not separate | ~$3.00/hr (spot) | Per-second |
| Azure (ND H100 v5) | Not separate | ~$5.40/hr | Per-minute |
A few important caveats. Hyperscaler pricing is per-instance, not per-GPU, and normalized rates depend on which instance configuration you use. Azure's ND H100 v5 instances normalize to $5.40/hr per GPU or higher depending on region. Google Cloud Spot can bring A3 (H100 SXM) instances down to approximately $2.25/hr before Sustained Use Discounts, which automatically reduce costs by up to 30 percent for month-long workloads. AWS cut P5 instance pricing 44 percent in June 2025, making its effective rate for sustained workloads more competitive when reserved.
For teams without specific ecosystem requirements, the gap between specialized providers and hyperscalers remains 40 to 60 percent on pure GPU-hour cost.
Billing Models: On-Demand, Spot, and Reserved
On-demand is the default for most teams. No commitment, pay per minute or per hour, spin up and shut down freely. GMI Cloud, RunPod, Lambda Labs, and Vast.ai all offer per-minute billing. Hyperscalers bill per-second (AWS, GCP) or per-minute (Azure). Per-hour billing, which some providers still use, is the most expensive model in practice because you pay for full hours regardless of actual usage.
Spot instances are interruptible but offer 60 to 90 percent discounts. The floor on spot H100 pricing across providers is approximately $1.25/hr, with GCP spot for H100 SXM running around $2.25/hr before discounts. Spot is the right model for any training job that checkpoints every 15 to 30 minutes and can resume from the last checkpoint without meaningful cost. It is the wrong model for production inference, long stateful jobs without checkpointing, or any workload where interruption carries a cost beyond compute time.
Reserved instances require 1 to 12 month commitments and provide 20 to 40 percent discounts versus on-demand. AWS 1-year reserved P5 instances normalize to approximately $1.90 to $2.10/hr per H100. Hyperstack's H100 SXM reserved starts at $1.90/hr. For teams with predictable, sustained workloads, reserved pricing on a specialized provider is typically the most cost-efficient option that combines reliability with savings.
Real Workload Cost Estimates
These are grounded estimates based on current H100 pricing. Actual costs vary with model architecture, batch size, quantization, and GPU utilization.
LoRA fine-tuning a 7B model (single H100, 2 to 4 hours): $4 to $10 on GMI Cloud at $2.00/hr. This is the most accessible entry point for domain-specific model customization.
LoRA fine-tuning Llama 3.1 70B (4x H100, 15 hours): $120 to $180 on GMI Cloud. At AWS rates ($3.90/hr), the same job runs $234. Engineering time typically dominates the total project cost at this scale.
Full fine-tuning 70B model (8x H100 SXM, 24 to 48 hours): $250 to $510 on GMI Cloud SXM at $2.40/hr. The same compute on Azure can exceed $1,000.
Training a 70B model from scratch (8x H100, 300 to 1,000 GPU-hours): $4,800 to $16,000 on GMI Cloud. This range is wide because training costs scale with data volume, architecture choices, and optimization approach. Most teams use pre-trained foundation models and fine-tune rather than training from scratch at this scale.
Production inference serving (1x H100 running 24/7): $1,460/month on GMI Cloud at $2.00/hr, $2,847/month on AWS at $3.90/hr. The $1,387/month difference compounds quickly across multiple serving instances.
Hidden Costs That Change the Real Comparison
The hourly rate is the starting point, not the full picture. Four additional cost categories change the real comparison between providers.
Egress fees. Hyperscalers charge $0.08 to $0.12 per GB for data transferred out of their networks. For a team downloading large model checkpoints or serving models that generate substantial output, egress fees add 10 to 20 percent to monthly bills. GMI Cloud does not charge egress fees, which matters for teams moving data frequently between training and serving infrastructure.
Hypervisor overhead. Virtual machine instances on hyperscalers run on top of a hypervisor that consumes 10 to 15 percent of GPU memory bandwidth. A provider advertising $3.90/hr for an H100 on a virtual machine is effectively delivering 85 to 90 percent of rated hardware performance. GMI Cloud's bare metal instances deliver 100 percent of rated H100 performance, which lowers the effective cost per unit of work.
Storage. Persistent filesystem storage costs $0.15 to $0.20/GB/month across most providers. For teams storing large model checkpoints or datasets, storage becomes a meaningful line item.
Billing granularity. Providers that bill per hour rather than per minute add waste on every short job. A training run that finishes in 47 minutes costs a full hour on hourly billing. For teams running dozens of short jobs daily, per-minute billing produces measurable savings.
Why GMI Cloud for H100 Rentals
GMI Cloud operates as an NVIDIA Reference Platform Partner with H100 infrastructure built exclusively for AI workloads. The platform combines competitive per-hour pricing with architecture decisions that reduce total cost of ownership beyond the headline rate.
H100 PCIe at $2.00/hr and SXM at $2.40/hr are available on bare metal, with no hypervisor overhead and no egress fees. Per-minute billing means you pay for what you use. 8-GPU H100 clusters with 3.2 Tbps InfiniBand networking are available for distributed training workloads that require SXM-class inter-GPU bandwidth.
For inference-first workloads, the GMI Cloud Inference Engine provides serverless scaling to zero, automatic request batching, and latency-aware scheduling. Teams do not need to provision or manage GPU instances for inference; the platform handles scaling based on actual request volume. This removes the idle capacity cost that makes fixed H100 instances expensive for variable-traffic production APIs.
Production benchmarks from teams running on GMI Cloud reflect both the pricing and infrastructure advantages: Higgsfield achieved 65 percent lower p95 inference latency and 45 percent lower compute cost versus their prior provider. Mirelo AI cut training costs by 40 percent and reduced training time by 20 percent.
How to Choose the Right H100 Provider
Start with your workload type. Distributed training across 4 or more GPUs requires SXM with NVLink. Single-GPU or inference workloads are well-served by PCIe at lower cost. Do not pay for SXM capabilities your workload does not use.
Then match billing to your usage pattern. Variable or unpredictable traffic belongs on per-minute on-demand or serverless. Sustained, predictable workloads benefit from reserved pricing. Jobs that checkpoint reliably can use spot instances.
Factor in total cost, not just hourly rate. At $2.00/hr versus $3.90/hr for identical hardware, GMI Cloud saves $1.90 per GPU-hour. Across a 30-day month with a single H100 running 12 hours daily, that difference is $684. Across an 8-GPU training cluster at full utilization, it is significantly more.
Consider ecosystem requirements. If your stack depends on AWS-specific services (managed databases, VPC configurations, IAM policies), the ecosystem premium may be justified. If your primary use is GPU compute for training and inference, a specialized provider offers better economics with no meaningful tradeoff.
FAQs
What is the cheapest way to rent an H100 GPU in 2026? The lowest published H100 rates come from spot instances on GPU marketplaces. Vast.ai lists H100 PCIe from $1.87/hr and spot pricing across specialized providers can reach $1.25/hr. For managed on-demand access without interruption risk, GMI Cloud at $2.00/hr for H100 PCIe and RunPod Community Cloud at approximately $1.99/hr are the most competitive options. Spot instances are appropriate for checkpointed training jobs that can resume after interruption. Production inference and latency-sensitive workloads belong on reliable on-demand infrastructure.
How much does it cost to run an H100 for a full month? A single H100 running continuously for 730 hours (one month) costs $1,460 on GMI Cloud at $2.00/hr, $2,847 on AWS at $3.90/hr, or $3,942 on Azure at approximately $5.40/hr. Most teams do not run GPUs at 100 percent utilization continuously. For 200 to 400 hours of usage per month, GMI Cloud costs $400 to $800 versus $780 to $1,560 on AWS for comparable hardware. Per-minute billing and serverless scaling to zero are both significant factors in total monthly spend for intermittent workloads.
What is the difference between H100 PCIe and SXM pricing, and does it matter for my workload? H100 SXM typically costs $0.40 to $0.50/hr more than PCIe on the same platform. On GMI Cloud that is $2.00/hr for PCIe versus $2.40/hr for SXM. The SXM premium is justified for distributed training across 4 or more tightly coupled GPUs, where NVLink's 900 GB/s bandwidth (versus PCIe Gen5's 128 GB/s) directly determines training throughput. For inference serving, LoRA fine-tuning, and single-GPU workloads, PCIe delivers the same 80GB VRAM at 20 percent lower cost. Choosing PCIe for inference and SXM only when NVLink is required is the standard cost optimization approach.
How do hyperscaler H100 prices compare to specialized providers after accounting for all fees? On headline on-demand rates, AWS charges approximately $3.90/hr per H100 versus $2.00/hr on GMI Cloud. That 95 percent rate difference compounds with additional costs. Egress fees on AWS run $0.08 to $0.12/GB, adding 10 to 20 percent to bills for teams moving data frequently. Virtual machine hypervisor overhead reduces effective GPU performance by 10 to 15 percent, raising the real cost per unit of work by a comparable margin. A $3.90/hr virtual H100 on AWS delivering 85 percent of rated performance has an effective rate of approximately $4.59/hr. GMI Cloud's bare metal instances deliver full rated performance with no egress fees, making the total cost difference larger than the hourly rate comparison alone suggests.
When does buying H100 GPUs outright become cheaper than renting? At $2.00/hr cloud rental, a single H100 PCIe purchased for $25,000 to $30,000 requires 12,500 to 15,000 hours of continuous operation to break even on hardware cost alone. That is 17 to 20 months running 24 hours per day, 7 days per week, before accounting for infrastructure costs (power at roughly $60/month per GPU, cooling, networking), depreciation, and the opportunity cost of capital tied up in hardware. Purchasing only makes financial sense for teams running sustained, predictable, near-continuous workloads for multiple years with the in-house infrastructure expertise to operate and maintain the hardware. For most teams, cloud rental from GMI Cloud is more capital-efficient and provides access to hardware upgrades without additional investment.
Build AI Without Limits
GMI Cloud helps you architect, deploy, optimize, and scale your AI strategies

.webp)