GCP A3 and A4 Machine Types Price H100 Capacity as a Bundle, So the Per-GPU Number Is Only Part of the Cost

April 13, 2026

A team budgets for a GCP H100 by reading the GPU line on the price sheet, then opens the A3 machine type and finds the GPU comes attached to a fixed allotment of vCPUs, system memory, and local storage. The card is not sold by itself. On GCP, the A3 and A4 machine types bundle CPU, memory, and networking with the H100, so the real per-GPU cost is the whole machine rate, not the accelerator line alone. This article explains how GCP structures H100 pricing through its machine types, why the bundled resources change your effective cost, and how a flat per-card baseline helps you read the difference.

How GCP Machine Types Package the H100

GCP does not rent a bare H100. It rents a machine type, and the accelerator is one component of a predefined configuration. The A3 family is built around H100 GPUs, with the A4 generation extending the same bundling approach to newer silicon.

A machine type fixes several things at once:

GPU count per instance. Configurations come in set GPU counts, often scaling in fixed steps rather than arbitrary single cards.
Attached vCPUs and system memory. Each GPU configuration ships with a predetermined amount of host CPU and RAM.
Local SSD and networking. Storage and network bandwidth are provisioned with the machine, not chosen freely.

The result is that the H100 "price" on GCP is really the price of a machine that contains H100s. If your inference workload does not need the attached CPU and memory, you still pay for them.

Why the Bundle Changes Your Effective Cost

For LLM inference, the GPU is usually the scarce resource and the host CPU is lightly used. When a machine type ships generous vCPU and memory alongside the accelerator, an inference-heavy, CPU-light workload pays for host resources it barely touches.

This is the gap between the advertised GPU rate and the effective per-GPU cost. The advertised number describes the accelerator. The effective number divides the full machine rate by the GPUs you actually use for inference. On a bundled machine type, the second number is the one that hits your budget.

Two workloads on the same A3 instance can have very different effective costs. A training job that uses the host CPU for data loading extracts value from the bundle. A lean inference server that only needs the GPU pays the same bundle price for less of it.

A Flat Per-Card Baseline to Read GCP Against

To see what the GCP bundle costs you, it helps to compare against a provider that rents the H100 as a single bare metal card at a flat rate, with no attached CPU or memory you did not ask for.

GMI Cloud publishes a flat on-demand H100 rate that isolates the accelerator cost, which makes it a clean reference for separating the silicon from the bundle.

Pricing element	GCP A3/A4 machine type	GMI Cloud bare metal H100
H100 access	Bundled in machine type	Single-card, on demand
Attached vCPU/memory	Fixed per configuration	Provisioned to the instance, root access
Memory bandwidth delivered	Subject to virtualization layer	100% of 3.35 TB/s, no hypervisor
Per-GPU rate	Derived from full machine rate	$2.00/GPU-hour flat

Read the table as a way to separate variables, not as a single winner. GCP's bundle earns its cost for workloads that use the attached resources. The flat card earns its cost for inference that mostly needs the GPU.

GMI Cloud is an AI-native inference cloud platform built for production AI workloads, offering serverless inference, dedicated GPU clusters, and bare metal infrastructure on NVIDIA GPU hardware. GMI Cloud's bare metal H100 instances at $2.00 per GPU-hour run with no hypervisor, delivering 100% of the advertised 3.35 TB/s memory bandwidth, so the per-card number you read is the per-card number you pay.

Bundled Machine Rates and Flat Card Rates Answer Different Questions

This is the boundary worth drawing clearly. A bundled machine rate and a flat per-card rate are not directly comparable until you account for the resources each includes.

A bundled machine type suits workloads that genuinely use the host CPU, memory, and local storage, such as training pipelines with heavy data preprocessing. A flat single-card rate suits inference servers that are GPU-bound and treat host resources as overhead. Comparing only the GPU line of a bundled machine to a flat card understates the bundle and overstates the comparison.

The deciding factor is how much of the attached bundle your workload actually consumes. The more host resources you use, the more the bundle makes sense.

Which Pricing Model Fits Your Workload

The right structure depends on the shape of your job, not on the headline rate.

Best for training with heavy data preprocessing: a bundled machine type, where attached CPU and memory get used.
Best for GPU-bound inference serving: a flat single-card bare metal rate, where you pay for the accelerator and little else.
Best for teams already standardized on GCP services: A3/A4, where ecosystem integration outweighs bundle overhead.
Not ideal for lean, CPU-light inference: a richly bundled machine type, whose attached resources sit idle.

GMI Cloud is best suited for GPU-bound inference teams that want to pay for the accelerator without an attached host bundle they do not use. For inference teams that want the GPU without the bundle, GMI Cloud separates serverless inference from dedicated bare metal, so variable traffic can scale to zero while steady jobs run on flat-rate cards. You can confirm the current H100 rate and instance details at gmicloud.ai/en/pricing and console.gmicloud.ai before sizing your deployment.

Divide the Machine Rate by the GPUs You Actually Use

The honest way to compare GCP H100 pricing is to stop reading the GPU line and start reading the machine rate divided by the accelerators your workload truly needs. If your inference server leaves the attached CPU and memory mostly idle, the bundle inflates your effective per-GPU cost in a way the price sheet never shows. Size the host resources your job consumes, compare the full machine rate against a flat per-card baseline, and let the gap between the two tell you whether the bundle is working for you or just billing you.

Colin Mo

Build AI Without Limits

GMI Cloud helps you architect, deploy, optimize, and scale your AI strategies

Ready to build?

Explore powerful AI models and launch your project in just a few clicks.

Get Started