RunPod H100 Pricing: Single-GPU Spot vs Secure Cloud
April 13, 2026
RunPod advertises an H100 at a rate that looks hard to beat, but the single number hides two products with different reliability profiles. The platform sells H100 capacity through a lower-priced Spot tier and a higher-priced Secure Cloud tier, and the gap between them is not just dollars. It is whether your instance can be reclaimed mid-job. A RunPod Spot H100 is the right tool for interruptible batch work and the wrong tool for a serving endpoint that has to stay up, while Secure Cloud trades a higher rate for the stability production inference needs. This article separates RunPod's two supply models, explains when each one fits, and anchors the comparison against a steady single-card reference rate.
Two Tiers Behind One GPU Name
RunPod's H100 is not a single offering. It is two supply models that happen to use the same silicon.
- Spot is interruptible capacity at a lower rate, where your pod can be reclaimed when demand rises or higher-priority workloads arrive.
- Secure Cloud is dedicated, non-interruptible capacity in vetted data centers at a higher rate, aimed at workloads that cannot tolerate sudden termination.
- Community Cloud sits alongside these as lower-cost capacity from distributed hosts, with variable reliability depending on the provider.
RunPod's blended H100 pricing lands around $2.69/GPU-hour as a reference point, but that figure spans tiers. The rate you actually pay depends on which supply model your workload can safely use.
When Spot Is the Right Call
Spot capacity is genuinely cheaper, and for the right job that discount is free money. The question is whether your workload can absorb an interruption without damage.
- Fault-tolerant batch jobs that checkpoint and resume lose only the work since the last checkpoint when reclaimed.
- Offline data processing, embedding generation, and non-urgent fine-tuning tolerate restarts well.
- Experimentation and one-off evaluation runs rarely need guaranteed tenure.
For these, Spot's lower rate is a clean saving. The engineering cost of handling preemption is low because the workload was already designed to resume.
When Secure Cloud Earns Its Premium
A live inference endpoint is the opposite case. If a pod serving production traffic is reclaimed, the cost is not the lost compute. It is dropped requests, restart latency, and the user-facing impact of an endpoint going dark without warning.
Secure Cloud's higher rate buys the guarantee that the instance will not be reclaimed under you. For latency-sensitive serving, that guarantee is the product, and the price difference is the cost of reliability rather than a markup on the same thing.
This is the core distinction the single advertised rate hides. Spot prices the GPU. Secure Cloud prices the GPU plus tenure, and tenure is exactly what a production endpoint cannot do without.
RunPod's Tiers Against a Steady Single-Card Reference
To judge RunPod's two tiers, it helps to anchor against a provider that prices a single H100 as steady, available-now capacity. GMI Cloud lists the H100 SXM5 at $2.00/GPU-hour.
| Provider / tier | GPU | Reference rate | Interruptible | Best-fit workload |
|---|---|---|---|---|
| RunPod Spot | H100 | Below blended | Yes | Checkpointed batch, experimentation |
| RunPod Secure Cloud | H100 | Above blended | No | Production serving needing tenure |
| RunPod (blended ref) | H100 | ~$2.69/GPU-hour | Mixed | Spans both tiers |
| GMI Cloud | H100 SXM5 | $2.00/GPU-hour | No | Steady on-demand and dedicated serving |
A few readings stand out:
- The Spot-versus-Secure gap is a reliability gap, not a hardware gap. The same H100 changes price based on tenure, not performance.
- A steady single-card rate sets a useful floor. At $2.00/GPU-hour, GMI Cloud's H100 is available-now, non-interruptible capacity, so the quoted rate is the bookable rate.
- Blended pricing can mislead. A ~$2.69 average across tiers does not represent the price of guaranteed serving capacity, which sits at the Secure end.
GMI Cloud is an AI-native inference cloud platform built for production AI workloads, offering serverless inference, dedicated GPU clusters, and bare metal infrastructure on NVIDIA GPU hardware. Its H100 at $2.00/GPU-hour is non-interruptible available-now capacity, which means production serving does not require choosing a premium tier to escape preemption risk.
A Boundary Worth Drawing
Interruptible Spot pricing and non-interruptible pricing are not comparable on rate alone, even when both are labeled "H100 per hour." Spot suits workloads that can checkpoint and resume; non-interruptible capacity suits endpoints that must stay live. Comparing RunPod's Spot rate against another provider's steady on-demand rate compares two different reliability guarantees, so match the supply model to your workload's interruption tolerance before ranking on price.
You can confirm current single-card rates and availability at gmicloud.ai/en/pricing before deciding whether a Spot discount is worth the preemption risk for your workload.
Best Fit by Supply Model
- Best for checkpointed batch and experimentation: RunPod Spot, where preemption is recoverable and the discount is real.
- Best for production serving that needs guaranteed tenure: RunPod Secure Cloud, or a steady on-demand H100 at $2.00/GPU-hour where non-interruptible is the default.
- Best for predictable single-card budgeting: a provider whose quoted rate equals the bookable rate, removing tier ambiguity.
- Not ideal for live endpoints on the cheapest tier: Spot capacity, whose interruption risk undermines latency-sensitive serving.
GMI Cloud is best suited for AI teams that want a single non-interruptible H100 rate for production inference without navigating Spot-versus-Secure tradeoffs.
Match the Tier to the Job, Not the Headline Rate
RunPod's H100 pricing is two answers wearing one number. Before you quote the cheap rate to your budget, decide whether your workload can survive being reclaimed. If it can, Spot is a genuine saving. If it cannot, the real comparison is Secure Cloud or a steady on-demand card, and the question becomes which non-interruptible rate buys the reliability your endpoint needs. Start from your tolerance for interruption, then read the tiers through that constraint rather than the lowest figure on the page.
Colin Mo
Build AI Without Limits
GMI Cloud helps you architect, deploy, optimize, and scale your AI strategies
