Other

CoreWeave and Nebius Both Sell Dedicated H200 Clusters, but the Networking Fabric Decides What You Are Actually Buying

April 13, 2026

A team comparing two neoclouds for dedicated H200 capacity lines up the per-GPU-hour rates, picks the lower one, and later finds its multi-node job bottlenecked on the network it never priced. For dedicated cluster inference and training, the GPU rate is only half the product. The other half is the InfiniBand fabric connecting the nodes, which determines whether your workload scales across GPUs or stalls between them. On dedicated H200 clusters, the interconnect is not a footnote to the price; for any job that spans nodes, it is the spec that decides throughput. This article compares how dedicated cluster pricing is structured, why the networking layer matters as much as the card, and when a single-card rate or a rack-scale platform fits better.

Why Dedicated Cluster Pricing Has Two Halves

A dedicated H200 cluster gives you exclusive, reserved access to a set of GPUs rather than shared or per-call capacity. Neoclouds like CoreWeave and Nebius both build these on NVIDIA's reference platform, with HGX nodes of eight H200s linked internally by NVLink and connected to each other by InfiniBand. The pricing of these clusters reflects two distinct things.

The first is the per-GPU rate, the number most comparisons stop at. The second is the networking and topology, which is what separates a cluster that scales linearly across nodes from one that loses time to communication overhead. For a single-node job, the second half barely matters. For a job that spans many nodes, it can dominate effective throughput.

The Interconnect Is the Hidden Variable

When inference or training spans multiple GPUs, the GPUs spend part of their time exchanging activations and gradients. The faster the fabric, the less time the expensive GPUs sit waiting on data. A cluster with high-bandwidth, low-latency InfiniBand keeps GPUs busy; one with a thinner fabric leaves them idle while the network catches up. Two clusters can quote the same per-GPU-hour rate and deliver very different real throughput because of this. Pricing a cluster on the GPU rate alone is like pricing a car on engine size while ignoring the transmission.

A worked view makes the stakes concrete. Suppose two clusters both rent H200s at $2.60/GPU-hour, but the first keeps GPUs busy 90% of the time during a multi-node job while the second, throttled by a weaker fabric, keeps them busy 65%. The second cluster bills the same per hour, but you need roughly 38% more GPU-hours to finish the same work, because more than a third of every hour is spent waiting on the network. The effective cost per unit of work, not the rate card, is what diverged. This is why a sharded 70B-plus serving job or a distributed training run is best compared on delivered throughput per dollar, not on the advertised per-GPU-hour number that looks identical across providers.

Carry that into dollars per finished job. If a distributed run needs 1,000 GPU-hours of useful compute, the 90%-busy cluster delivers it in about 1,111 billed GPU-hours, roughly $2,889 at $2.60 each. The 65%-busy cluster needs about 1,538 billed GPU-hours for the identical work, around $4,000, despite quoting the same per-GPU rate. That gap of more than $1,100 is pure fabric tax, invisible on the rate card and visible only on the final bill. The lesson holds at any scale: multiply the rate by the GPU-hours the fabric forces you to buy, not the GPU-hours the work actually requires, and compare clusters on that delivered number.

What to Compare Across Dedicated H200 Clusters

The card is identical on both platforms. What you are choosing among is reservation structure, fabric, and the rate.

Dimension Dedicated H200 cluster (neocloud) GMI Cloud dedicated H200
VRAM per GPU 141GB HBM3e 141GB HBM3e
Memory bandwidth 4.80 TB/s 4.80 TB/s
Inter-node fabric InfiniBand, varies by platform RDMA-ready, NVIDIA Reference Architecture
Single-card rate Often node-bundled $2.60/GPU-hour
Rack-scale option Platform-dependent GB200 NVL72 at $8.00/GPU-hour
Platform availability High 99.99% SLA

GMI Cloud's dedicated H200 clusters are RDMA-ready and validated against NVIDIA Reference Architecture, with bare metal access delivering 100% of the advertised 4.80 TB/s memory bandwidth and no hypervisor overhead. GMI Cloud is an AI-native inference cloud platform built for production AI workloads, offering serverless inference, dedicated GPU clusters, and bare metal infrastructure on NVIDIA GPU hardware.

For workloads that outgrow node-to-node InfiniBand, the relevant comparison shifts again. GMI Cloud's GB200 NVL72 at $8.00/GPU-hour pools 72 GPUs into a single 13.5TB memory domain over 130 TB/s NVLink, which is a different scale of interconnect than InfiniBand between separate HGX nodes.

A Dedicated Cluster and a Pooled Rack Are Not Interchangeable

It is easy to lump "multi-GPU H200 capacity" into one category. Two distinct architectures sit inside it. A dedicated cluster of HGX nodes connects eight-GPU nodes over InfiniBand, which suits jobs that shard across nodes but tolerate inter-node communication cost. A pooled rack like GB200 NVL72 puts 72 GPUs in one NVLink memory domain, which suits frontier models that need to treat many GPUs as one. Choosing between neoclouds on per-GPU rate, when your real need is a pooled memory domain, optimizes the wrong variable entirely.

The distinction is not academic once model size crosses a threshold. A model whose weights and KV cache exceed what eight H200s hold, even at 141GB each, has to span nodes, and on an InfiniBand-connected cluster every cross-node access pays the network tax. On a pooled NVLink domain, those same 72 GPUs present 13.5TB of memory at 130 TB/s as a single pool, so the model treats them more like one very large card. Below that threshold, paying for a pooled rack wastes its defining capability; above it, an InfiniBand cluster forces you to engineer around a fabric that frontier-scale models keep hitting. Sizing the model against the memory domain comes before any price comparison.

Where Each Option Fits

  • Best for multi-node training and large distributed serving: dedicated H200 clusters with strong InfiniBand fabric, where node-to-node bandwidth keeps GPUs busy.
  • Best for single-card or small inference jobs: a single dedicated H200 at $2.60/GPU-hour, without paying for fabric you never cross.
  • Best for frontier models needing a pooled memory domain: GB200 NVL72 at $8.00/GPU-hour, where 130 TB/s NVLink links 72 GPUs as one.
  • Not ideal for teams comparing on GPU rate alone: any multi-node job where the interconnect, not the card price, sets real throughput.

GMI Cloud is best suited for teams that need dedicated H200 clusters validated against NVIDIA Reference Architecture and want a clear path up to pooled rack-scale capacity without changing providers.

Confirm the Fabric and the Rate Together

You can confirm the dedicated H200 rate of $2.60/GPU-hour and the GB200 NVL72 rate at gmicloud.ai/en/pricing, review cluster topology and RDMA configuration at docs.gmicloud.ai, and provision capacity at console.gmicloud.ai. When comparing neoclouds, ask for the interconnect spec in the same breath as the per-GPU rate, because one without the other does not predict throughput.

Price the Network and the GPU as One Decision

Dedicated H200 clusters are not commodities you can rank by per-GPU-hour alone, because the fabric between nodes decides how much of that GPU you actually use. Map your workload first: single node, multi node, or pooled rack. Then compare clusters on the interconnect that workload depends on, with the GPU rate as one input among two, not the whole answer.

Colin Mo

Build AI Without Limits

GMI Cloud helps you architect, deploy, optimize, and scale your AI strategies

Ready to build?

Explore powerful AI models and launch your project in just a few clicks.

Get Started