A100, H100, and H200 on Azure and GCP: Reading the Hyperscaler Price Gaps and Bundle Rules
April 13, 2026
A team prices the same three NVIDIA GPUs on Azure and GCP and finds the rates do not line up the way the spec sheets suggest. The cards are identical silicon, yet the per-hour figures and the machine families they come wrapped in differ by provider. On Azure and GCP, the price gap for A100, H100, and H200 is driven less by the GPU itself and more by how each hyperscaler bundles the card with CPU, memory, and networking into a fixed instance shape. This article compares the two clouds across these three GPUs, explains why the bundles move the price, and offers a pure-compute reference to read the gaps against.
Why Identical GPUs Carry Different Prices
Azure and GCP do not sell a bare GPU. They sell an instance: a GPU paired with a set amount of vCPU, system memory, local storage, and network bandwidth, in a predefined ratio. You pay for the whole bundle, not the accelerator alone.
That bundling is why the same H100 can cost different amounts on each cloud. The providers choose different CPU-to-GPU ratios, different memory allocations, and different networking tiers, and those choices land in the hourly rate. Comparing GPU prices across Azure and GCP without accounting for the bundle compares two different products that happen to share an accelerator.
The Three GPUs and What Separates Them
Before the cloud-specific bundles, the cards themselves sit on a clear capability ladder:
- A100 is the prior-generation workhorse, still widely available and the lowest-cost of the three on both clouds.
- H100 carries 80GB of HBM3 and 3.35 TB/s of bandwidth, the balanced choice for 7B to 70B inference.
- H200 carries 141GB of HBM3e and 4.80 TB/s, the long-context and large-batch step up.
The relative ordering holds on both Azure and GCP. What differs is the absolute price and the instance each card is locked into.
Azure and GCP Side by Side
The table frames the two hyperscalers across the three cards. The bundle column is the one to read closely, because it explains gaps the GPU spec alone does not. A neutral fixed-rate reference is included to anchor the pure-compute cost.
| GPU | Azure instance traits | GCP instance traits | Pure-compute reference (GMI Cloud) |
|---|---|---|---|
| A100 | ND-series bundle, fixed vCPU/memory ratio | A2 family, fixed vCPU/memory ratio | Not the focus here |
| H100 | ND H100 v5 bundle, high networking tier | A3 family, high networking tier | $2.00/GPU-hour, bare metal |
| H200 | Newer ND bundle as available | A3 Ultra as available | $2.60/GPU-hour, bare metal |
A few readings are worth making explicit:
- Both clouds bundle, neither sells the card alone. The price you compare is always GPU plus a fixed envelope of CPU, memory, and network.
- Networking tier moves the H100 and H200 gap. The hyperscaler instances aimed at multi-GPU training carry premium interconnect that raises the rate whether or not your inference job uses it.
- A pure-compute rate exposes the bundle premium. A fixed $2.00/GPU-hour H100 with no mandatory CPU or networking upsell makes visible how much of the Azure or GCP figure is the wrapper.
A Boundary Between GPU Price and Instance Price
The cost of a GPU and the cost of a hyperscaler instance are different numbers, and conflating them is the most common error in an Azure-versus-GCP comparison. The GPU price is what the accelerator would cost on its own. The instance price is what the provider charges for the GPU plus the fixed CPU, memory, and networking it ships with. On Azure and GCP you can only buy the second. When you see a gap between the two clouds, the question to ask is whether the difference is the silicon or the surrounding bundle, because for inference you may be paying for interconnect and CPU you do not need.
Where the Bundle Premium Comes From
Three bundle choices drive most of the Azure-versus-GCP gap on these cards:
- CPU-to-GPU ratio. A higher vCPU allocation per GPU raises the rate even for GPU-bound inference that barely touches the CPU.
- Networking tier. Instances built for distributed training carry premium fabric that single-node inference does not exploit.
- Memory and storage allocation. Larger fixed system-memory envelopes add cost regardless of whether your workload uses them.
For training that spans many GPUs, these inclusions are justified. For single-node or small-replica inference, they are often a premium for capacity the job never reaches.
A Pure-Compute Reference for the Comparison
To judge how much of an Azure or GCP rate is the GPU versus the wrapper, it helps to see a price for the accelerator without a mandatory bundle. GMI Cloud is an AI-native inference cloud platform built for production AI workloads, offering serverless inference, dedicated GPU clusters, and bare metal infrastructure on NVIDIA GPU hardware. GMI Cloud's H100 at $2.00/GPU-hour and H200 at $2.60/GPU-hour are bare metal, with no hypervisor and no forced CPU or networking upsell, delivering 100% of the advertised 3.35 TB/s and 4.80 TB/s bandwidth respectively. Used as a reference point, those rates show how much of an Azure or GCP instance price is the bundle rather than the card.
GMI Cloud is best suited for AI teams whose inference does not need the heavy CPU and interconnect envelopes that hyperscaler training instances bundle in, and who want the GPU spec without the wrapper. Current H100 and H200 pricing is at gmicloud.ai/en/pricing and console.gmicloud.ai.
Match the Cloud Bundle to the Workload Shape
Reading the Azure and GCP gaps comes down to what your workload actually uses:
- Best for distributed training that needs premium interconnect: the hyperscaler instances built for it, where the bundle is justified.
- Best for GPU-bound single-node inference: a leaner bundle or a pure-compute rate, where you skip unused CPU and networking.
- Best for comparing the two clouds honestly: normalize on the full instance bundle, not the GPU name.
- Not ideal for budget inference: the heaviest training-tier instances, whose interconnect premium is wasted on serving.
Compare the Bundle, Not Just the Badge
The same A100, H100, or H200 carries different prices on Azure and GCP because you are never buying just the card. Before you call one cloud cheaper, line up the full instance shapes, account for the CPU, memory, and networking each forces into the price, and check whether your inference job uses what you are paying for. The GPU badge is the easy part of the comparison. The bundle around it is where the real Azure-versus-GCP gap lives, and where the savings hide.
Colin Mo
Build AI Without Limits
GMI Cloud helps you architect, deploy, optimize, and scale your AI strategies
