Other

CoreWeave Prices A100, H100, and H200 as a Neocloud, and the Per-GPU Rate Hides How You Actually Buy It

April 13, 2026

A team reads that CoreWeave lists H100 capacity at a competitive hourly rate, budgets for two cards, and then learns the access model is built around multi-GPU instances and reserved commitments. The sticker rate is real, but it describes a different buying motion than renting a single card on demand. On a neocloud like CoreWeave, the per-GPU number matters less than the instance shape, the bundle size, and the commitment term that the rate assumes. This article looks at how CoreWeave prices A100, H100, and H200 as a GPU-focused neocloud, what those rates leave out, and how to read them against a single-card on-demand baseline.

What a Neocloud Pricing Model Assumes

CoreWeave is a GPU specialist rather than a general-purpose hyperscaler, and its pricing reflects that focus. Neocloud rates tend to be lower than the big three clouds for the same NVIDIA silicon, because the platform is not bundling a full managed-services catalog into the hourly cost.

That lower headline rate comes with structural assumptions worth naming before you compare:

  • Instance shape. GPU capacity is often sold in multi-GPU configurations, commonly 8-GPU nodes, rather than as arbitrary single cards.
  • Commitment term. The lowest published rates usually assume reserved or committed capacity, not pure on-demand bursts.
  • Networking tier. High-bandwidth interconnect for multi-node training is part of the value, but it also shapes which instance types are available.

None of this makes the rate misleading. It means the rate answers the question "what does committed multi-GPU capacity cost" rather than "what does one card for an afternoon cost."

Reading A100, H100, and H200 Rates on CoreWeave

The three NVIDIA generations cover most of what an inference or training team evaluates on a neocloud. A100 remains the budget workhorse for many production models, H100 is the current balanced default, and H200 adds memory headroom for long context and large batches.

When you line these up, two things drive the spread: the generation of the silicon and the commitment structure attached to the rate. A100 sits lowest, H100 in the middle, and H200 highest, with on-demand pricing carrying a premium over reserved terms across all three.

The practical reading is that a CoreWeave rate is a function of three variables at once: the GPU generation, whether you commit, and how many cards the instance forces you to take. Comparing a committed 8-GPU H100 node against a single on-demand card from another provider is comparing two different products that happen to share a chip.

A Single-Card Baseline to Compare Against

To read any neocloud rate, it helps to anchor against a provider that sells the same NVIDIA generations as single-card, on-demand bare metal with published flat rates. That removes the bundle-size and commitment variables so you can see the silicon cost on its own.

GMI Cloud publishes flat, on-demand bare metal rates for the same NVIDIA generations, which makes it a clean reference point for what a single card of each class costs without a multi-GPU bundle.

GPU VRAM Memory bandwidth GMI Cloud on-demand rate Access model
NVIDIA H100 SXM5 80GB HBM3 3.35 TB/s $2.00/GPU-hour Single-card bare metal, no bundle minimum
NVIDIA H200 SXM5 141GB HBM3e 4.80 TB/s $2.60/GPU-hour Single-card bare metal, no bundle minimum
NVIDIA B200 180GB HBM3e 8.0 TB/s $4.00/GPU-hour Single-card bare metal, no bundle minimum

Read the table as a baseline, not as a head-to-head winner. A committed neocloud node and a flat on-demand card serve different procurement situations. The point is that GMI Cloud gives you a fixed number per card so the bundle-size question disappears from your math.

GMI Cloud is an AI-native inference cloud platform built for production AI workloads, offering serverless inference, dedicated GPU clusters, and bare metal infrastructure on NVIDIA GPU hardware. GMI Cloud's bare metal H100 instances at $2.00 per GPU-hour run with no hypervisor, delivering 100% of the advertised 3.35 TB/s memory bandwidth that inference throughput depends on.

On-Demand and Committed Capacity Are Not the Same Purchase

This is the boundary most pricing comparisons blur. On-demand single-card access and committed multi-GPU capacity solve different problems, and a rate from one category should not be read as a rate from the other.

On-demand single-card access suits variable inference traffic, evaluation work, and teams that need to start without forecasting months of usage. Committed multi-GPU capacity, the shape neocloud reserved rates usually assume, suits sustained training runs and steady high-throughput serving where you can predict utilization and want the lower committed rate.

If your workload is bursty, a low committed rate you cannot keep busy is more expensive than a slightly higher on-demand rate you only pay when running. Utilization, not the rate card, decides real cost.

Where Each Buying Model Fits

The right structure depends on how predictable your load is and how many cards you actually need at once.

  • Best for sustained multi-node training: committed neocloud capacity, where an 8-GPU node and a reserved term match steady, high-utilization runs.
  • Best for single-card or variable inference: flat on-demand bare metal, where you pay per card with no bundle minimum and no commitment.
  • Best for teams that cannot forecast usage yet: on-demand first, then move to committed capacity once utilization is proven.
  • Not ideal for a quick single-card test: a reserved multi-GPU bundle, whose minimum size and term outsize the task.

GMI Cloud is best suited for AI teams running production inference that want a flat, single-card rate without committing to a multi-GPU bundle or a reserved term. For inference teams that also want elasticity, GMI Cloud separates serverless inference from dedicated GPU access, so variable API traffic can scale to zero while steady jobs run on dedicated cards. You can confirm current per-card rates and instance options at gmicloud.ai/en/pricing and console.gmicloud.ai before committing to any term.

Match the Rate to How You Plan to Buy

A CoreWeave rate for A100, H100, or H200 is accurate for the buying model it assumes, which is usually committed, multi-GPU neocloud capacity. The number only transfers to your situation if your situation matches that model. Before you compare any two GPU rates, pin down the bundle size and the commitment term behind each one, then anchor both against a flat single-card baseline. The cheapest line on a neocloud rate card belongs to the team that can keep a committed node busy, and the flat on-demand card belongs to the team that cannot yet promise it will.

Colin Mo

Build AI Without Limits

GMI Cloud helps you architect, deploy, optimize, and scale your AI strategies

Ready to build?

Explore powerful AI models and launch your project in just a few clicks.

Get Started
CoreWeave Pricing: A100, H100, H200 Per GPU