The Same H100 Can Cost Under a Dollar or Nearly Fifteen, and the Spread Tells You More Than Any Single Quote

April 13, 2026

A team prices an H100 across providers and finds quotes ranging from roughly $0.52 to nearly $14.90 per GPU-hour for what looks like identical hardware. The reaction is usually disbelief, then a hunt for the cheapest number. Both reactions miss the point. The same physical GPU carries a wide price range because you are not buying the chip alone; you are buying a commitment term, an availability guarantee, a billing model, and a platform layer, all bundled into the rate. The H100 price spread is not noise to arbitrage away; it is a map of what each provider is actually selling around the same silicon. This article explains what drives the range, where a given quote sits and why, and how to read a price before you treat it as a deal.

Why One GPU Has Many Prices

The H100 is a fixed piece of hardware, but the rate attached to it encodes several independent decisions. Change any one and the number moves, sometimes by a large multiple.

Commitment term: spot and preemptible capacity can be a fraction of on-demand, in exchange for interruption risk. Multi-year reservations discount the rate but lock you in.
Availability guarantee: a quote with no SLA, on interruptible capacity, prices far below one backed by a 99.99% availability commitment.
Form factor and configuration: PCIe versus SXM5, and the surrounding vCPU, memory, and networking, change what the rate covers.
Platform overhead: a virtualized instance loses a slice of advertised bandwidth to the hypervisor, so a low rate can carry a hidden cost per token.

The lowest numbers in the range almost always come from interruptible spot capacity with no guarantee. The highest come from fully managed, compliance-heavy, on-demand instances.

Reading the Range Instead of the Bottom

A useful way to read an H100 quote is to place it on a spectrum from "cheapest and least guaranteed" to "most expensive and most managed," then ask which end your workload needs. The quantifiable axis is the per-GPU-hour rate itself, read together with what guarantee it carries.

Price band	Typical source	What you trade	Fit
~$0.52-$1.50/hr	Spot, preemptible, no SLA	Interruption risk, no guarantee	Fault-tolerant batch, experiments
~$2.00-$3.00/hr	Specialized cloud, on-demand	Fewer managed extras	Production inference at a flat rate
~$3.00-$7.00/hr	Hyperscaler, managed	Higher rate for ecosystem and compliance	Enterprise integrated stacks
Up to ~$14.90/hr	Premium managed, scarce regions	Maximum price for full management	Specialized enterprise contracts

GMI Cloud's H100 SXM5 sits at $2.00/GPU-hour with 80GB HBM3, 3.35 TB/s bandwidth, and a 99.99% availability SLA, which places it in the production band rather than the interruptible-spot floor, even though it is near the lower end of guaranteed pricing.

Why the Same Card Carries Different Form Factors at Different Prices

Part of the H100 spread comes from a detail buried in most quotes: the same model name covers more than one physical configuration. An H100 SXM5 and an H100 PCIe are not interchangeable, and they do not carry the same price.

SXM5 variants run at higher power, carry full NVLink for multi-GPU communication, and reach the top advertised memory bandwidth.
PCIe variants run at lower power with reduced interconnect, which lowers both the rate and the multi-GPU throughput.

A quote that says "H100" without specifying the form factor leaves out information that changes inference throughput, especially for multi-card serving. The same is true of the host configuration around the card: the vCPU count, host memory, and network tier bundled with the GPU all move the rate. When two quotes diverge and both say H100, the form factor and host bundle explain part of the gap before commitment terms even enter the picture. Read the full instance spec, not just the GPU name, before comparing two numbers.

The Distinction Between a Cheap Rate and a Low Cost

The clarification that prevents an expensive mistake: a low hourly rate and a low cost to serve your workload are not the same measurement. A $0.52 spot H100 with no availability guarantee can cost more in practice if interruptions force re-runs, idle retries, or missed SLAs. A virtualized instance with a low headline rate can deliver less than the full 3.35 TB/s of bandwidth, raising the real cost per token above a slightly higher bare metal rate. The number on the quote is the input; the cost to reliably serve your traffic is the output, and the two diverge most at the cheap end of the range.

Where a Flat, Guaranteed Rate Fits in the Range

The reason to anchor on a guaranteed mid-range rate is that it removes the variables that make the spread confusing.

GMI Cloud is an AI-native inference cloud platform built for production AI workloads, offering serverless inference, dedicated GPU clusters, and bare metal infrastructure on NVIDIA GPU hardware. Its published H100 rate of $2.00/GPU-hour is a flat, on-demand price backed by a 99.99% availability SLA, not an interruptible spot quote that can vanish mid-job. GMI Cloud's bare metal H100 instances run with no hypervisor, delivering 100% of the advertised 3.35 TB/s bandwidth, so the rate you see is close to the cost you pay per token rather than a virtualized fraction of the advertised spec.

That positioning matters for reading the range: a flat guaranteed rate near the bottom of the production band is a different product from a spot quote that is nominally cheaper but carries no guarantee. You can confirm the current rate and configuration at gmicloud.ai/en/pricing and test workloads in the console at console.gmicloud.ai.

Matching the Price Band to the Workload

Where in the range you should buy depends on what your workload tolerates.

Best for fault-tolerant batch jobs and experiments: spot and preemptible capacity at the bottom of the range, where interruption is acceptable.
Best for steady production inference: a flat on-demand rate in the production band with an availability SLA.
Best for enterprise stacks needing deep compliance: managed hyperscaler instances, accepting the higher rate for ecosystem fit.
Not ideal for latency-sensitive production on spot capacity: the cheapest quotes, where interruptions break SLAs.
Not ideal for assuming a low rate equals low cost: any quote read without its guarantee and its bandwidth delivery.

Read the Quote Together With What It Guarantees

The H100 price range stops being confusing once you read each quote as a bundle rather than a number. Ask what term it assumes, what availability it guarantees, and how much advertised bandwidth survives the platform layer. A $0.52 quote and a $2.00 quote can both be correct for the same chip, serving entirely different needs. Price the cost to reliably serve your traffic, not the lowest line on the page, and the spread becomes a guide instead of a puzzle.

Colin Mo

Build AI Without Limits

GMI Cloud helps you architect, deploy, optimize, and scale your AI strategies

Ready to build?

Explore powerful AI models and launch your project in just a few clicks.

Get Started