Azure ND H100 v5 Lists One Rate, but Reserved Discount Thresholds and Egress Fees Decide What You Actually Pay

April 13, 2026

A team prices an H100 deployment on Azure, reads the ND H100 v5 hourly rate, and builds a budget around it. Three months later the invoice is higher than the rate predicted, because the lowest published number assumed a reserved commitment the team never made, and the data leaving the region carried its own per-GB charge. On Azure, the ND H100 v5 on-demand rate is the ceiling, not the cost: reserved-term thresholds pull it down and egress fees push the total back up. This article breaks down how Azure structures ND H100 v5 pricing, where the hidden fees live, and how a flat bare metal rate helps you see what the listed number leaves out.

How Azure Prices the ND H100 v5

The ND H100 v5 is Azure's H100-based instance family aimed at large-scale AI training and inference. Like other hyperscaler GPU instances, it is sold as a machine with the accelerator bundled alongside host CPU, memory, and high-bandwidth networking.

The pricing has more than one number behind it:

On-demand rate. The highest per-hour cost, paid for flexible capacity with no commitment.
Reserved rates. Lower per-hour costs that require a one-year or three-year commitment to a discount threshold.
Spot or low-priority rates. Cheapest when available, but interruptible, which rules them out for steady production serving.

The number a team quotes from a quick search is usually the on-demand rate. The number an enterprise actually pays often comes from a reserved tier, which only applies once you commit to the threshold that unlocks it.

The Reserved Discount Threshold Is a Commitment, Not a Coupon

Reserved pricing on Azure is not a discount you simply select. It is a contractual commitment to pay for a defined amount of capacity over one or three years, whether or not you use it.

That changes the math in two directions. If your utilization is high and steady, the reserved rate genuinely lowers your effective cost per GPU-hour. If your utilization is variable, you pay for reserved capacity that sits idle, and the effective cost can exceed what on-demand would have been.

The reserved rate is only the real rate for teams that can keep the committed capacity busy across the full term. For everyone else, it is a lower headline number attached to a risk.

The Egress Fee the Rate Card Does Not Show

The second cost outside the instance rate is data egress. Moving inference outputs, model artifacts, or logs out of an Azure region carries per-GB charges that the GPU rate never mentions.

For inference workloads that return large payloads, generated media, long responses, or batched outputs, egress can become a meaningful line on the invoice. It scales with traffic, not with compute, so it grows exactly as your service succeeds. A budget built only on the ND H100 v5 hourly rate misses this entirely.

A Flat Bare Metal Rate to Compare Against

To see what the Azure structure adds, it helps to anchor against a provider that rents the H100 as a flat on-demand bare metal card with no commitment threshold and a transparent rate.

GMI Cloud publishes a single flat H100 rate that does not move with reservation terms, which isolates the silicon cost from the discount-and-egress structure.

Cost factor	Azure ND H100 v5	GMI Cloud bare metal H100
Published per-GPU rate	Tiered: on-demand vs reserved	$2.00/GPU-hour flat
Lowest rate condition	1 or 3 year reserved commitment	No commitment required
Egress charges	Per-GB on data leaving region	Not bundled into the GPU rate structure
Memory bandwidth delivered	Subject to virtualization	100% of 3.35 TB/s, no hypervisor

Read the table as a way to separate the listed rate from the conditions attached to it. Azure's reserved tier is a real saving for high, steady utilization. The flat card removes the commitment variable so the number you read is the number you pay.

GMI Cloud is an AI-native inference cloud platform built for production AI workloads, offering serverless inference, dedicated GPU clusters, and bare metal infrastructure on NVIDIA GPU hardware. GMI Cloud's bare metal H100 instances at $2.00 per GPU-hour run with no hypervisor, delivering 100% of the advertised 3.35 TB/s memory bandwidth and a flat rate that does not depend on a reservation term.

On-Demand and Reserved Are Different Commitments, Not Different Discounts

This is the boundary to keep clear. An on-demand rate and a reserved rate are not two prices for the same product. They are two different commitments.

On-demand suits variable inference traffic and teams that cannot yet forecast a year of usage, because you pay only for what you run. Reserved suits sustained, predictable workloads where you can guarantee high utilization across the term and want the lower committed rate. Choosing reserved for variable traffic trades flexibility for a discount you may not fully capture.

The deciding factor is forecast confidence. If you cannot promise steady utilization for the commitment term, the on-demand or flat rate protects you from paying for idle reserved capacity.

Which Azure Pricing Path Fits Your Workload

The right path depends on how predictable and how regional your workload is.

Best for steady, year-round inference at scale: reserved ND H100 v5, where the commitment threshold pays off.
Best for variable or early-stage traffic: on-demand or a flat bare metal rate, where you avoid idle reserved capacity.
Best for egress-light workloads inside one region: Azure, where data mostly stays put.
Not ideal for egress-heavy inference returning large payloads: any rate that ignores per-GB charges until the invoice.

GMI Cloud is best suited for AI teams that cannot forecast a full reservation term and want a flat rate with no egress charge bundled into the GPU cost. For teams that cannot yet commit to a reserved term, GMI Cloud separates serverless inference from dedicated bare metal, so variable traffic scales to zero while steady jobs run on a flat-rate card. You can confirm the current H100 rate and terms at gmicloud.ai/en/pricing and console.gmicloud.ai before you size a commitment.

Build the Budget From the Total, Not the Headline

The ND H100 v5 rate is accurate for exactly one scenario: the commitment tier it assumes, with egress excluded. Your real cost is that rate adjusted for the reservation you can honestly commit to, plus the per-GB charges your traffic will generate. Before you commit, forecast your utilization across the full term, add expected egress, and compare that total against a flat no-commitment baseline. The lowest published Azure number belongs to the team that can keep reserved capacity busy and keep its data in-region, and the flat rate belongs to the team that cannot promise either.

Colin Mo

Build AI Without Limits

GMI Cloud helps you architect, deploy, optimize, and scale your AI strategies

Ready to build?

Explore powerful AI models and launch your project in just a few clicks.

Get Started