other

RunPod vs Lambda vs CoreWeave vs GMI Cloud: What GPU Rental Really Costs for LLM Inference

May 28, 2026

Sticker price comparison misleads. Two providers can list the same H100 at very different rates, then quietly close the gap (or widen it by 2-3x) once you pick a billing mode. Teams pick the cheapest hourly number, then watch their actual bill balloon from cold-start fees, idle premiums, and minimum commits.

That's how a "$2.49/hr H100" turns into $4.10/hr effective. The honest comparison is effective $/GPU-hour after billing mode, utilization, and availability. This article walks through on-demand, serverless, and dedicated pricing across RunPod, Lambda, CoreWeave, and GMI Cloud, plus how to back out the effective rate.

The Direct Answer

For steady inference traffic above 30% utilization, on-demand or dedicated dedicated rentals beat serverless. For spiky workloads under 20% utilization, serverless wins despite higher per-second rates. The provider matters less than matching billing mode to traffic shape. Verify current rates on each provider's pricing page before committing.

Scope: This piece covers H100 and H200 SXM rental pricing for LLM inference. It doesn't cover training-grade reserved clusters, spot pricing volatility, or egress charges (which can swing the math further).

Three Billing Modes, Three Different Cost Curves

Before comparing logos, get the modes straight. Each provider mixes them differently.

  • On-demand: Pay per GPU-hour while the instance runs. You manage start, stop, and idle time. Billing granularity varies (per-second, per-minute, per-hour).
  • Serverless: Pay per second of active inference, scaled to zero when idle. You don't manage instances, but cold starts and queue latency become real costs.
  • Dedicated / reserved: Commit for a term (week, month, year). Lower hourly rate. You eat idle time regardless of utilization.

The same H100 runs on all three modes. Effective cost depends on which one you pick.

Sticker Price Snapshot

Here's the publicly listed H100 SXM hourly rate across the four providers, as of recent pricing pages. Verify current rates on each provider's pricing page.

Provider H100 SXM On-Demand H100 Serverless H200 SXM On-Demand
GMI Cloud ~$2.00/hr not the core offer ~$2.60/hr
RunPod ~$2.69/hr (community); ~$3.35/hr (secure) ~$0.00116/sec (~$4.18/hr active) varies
Lambda Labs ~$2.49-$3.29/hr not offered ~$3.29/hr
CoreWeave ~$4.25/hr (H100 PCIe public list); SXM higher not a primary tier listed

Rates above are pulled from each provider's public pricing pages and shift frequently. Always confirm before commitment.

Effective Cost: Steady Workload Worked Example

Assume you run a 70B-class LLM at 65% GPU utilization, 24/7, on one H100 for a month (730 hours).

Provider Mode Rate Monthly Cost
GMI Cloud H100 SXM On-demand $2.00/hr ~$1,460
Lambda H100 On-demand (mid-tier) $2.89/hr ~$2,110
RunPod H100 SXM (secure) On-demand $3.35/hr ~$2,445
CoreWeave H100 PCIe On-demand $4.25/hr ~$3,103

Three things to notice. First, the lowest sticker isn't always the cheapest provider in production (you'll see why in the Engineering Reality section). Second, the spread between cheapest and most expensive is ~2.1x for identical Hopper-class silicon. Third, for steady traffic, serverless isn't even in this race because you're paying for active inference seconds that already cover 24/7.

Effective Cost: Bursty Workload Worked Example

Now assume the opposite. Same model, but 15% utilization. You only generate tokens during business hours, with quiet weekends.

Mode Provider Effective Rate Monthly Cost (15% util)
On-demand, always-on GMI Cloud H100 $2.00/hr always running ~$1,460
On-demand, scripted stop/start GMI Cloud H100 $2.00/hr × 110 active hours ~$220
Serverless RunPod H100 active-only ~$4.18/hr × 110 hours ~$460
Dedicated monthly commit CoreWeave $4.25/hr always running ~$3,103

Two takeaways. Scripted stop/start on cheap on-demand beats serverless if your team can automate provisioning. Without that automation, serverless wins because the platform handles it for you. Dedicated commits are the worst choice for bursty workloads. Lower hourly, but you pay for idle.

Engineering Reality: Where the Math Breaks

The tables above assume the platform behaves. In production, it doesn't always.

Cold-start latency on serverless. RunPod serverless cold starts on an H100 can run 5-30 seconds depending on container size and model checkpoint location. If your p95 latency budget is 2 seconds, serverless cold starts will violate SLO under traffic dips. Mitigation: keep a warm worker pool, which raises effective cost toward dedicated.

Billing granularity. GMI Cloud and Lambda bill per-minute on most instance types. RunPod bills per-second on serverless. CoreWeave's commits are hourly. If you spin up an instance for a 90-second benchmark, hourly billing charges the full hour. That's a 40x premium on small jobs.

Queue times under load. When H100 supply gets tight (it usually is), on-demand requests can queue. Lambda has had publicly documented availability gaps. CoreWeave prioritizes reserved customers. GMI Cloud and RunPod allocation depends on region.

Premium GPU availability. H200 supply in 2026 is still thin across all providers. Listed prices don't help if the instance isn't available when you need it. Confirm regional availability before architecting around H200.

Network egress. None of the four bake egress into the GPU-hour rate. Heavy retrieval-augmented generation workloads can add 10-20% to the total bill. Read each provider's egress schedule.

Decision Framework

Your traffic shape Best billing mode Provider considerations
24/7 steady, predictable load On-demand or dedicated commit Lowest sticker H100/H200 rate wins. GMI Cloud and RunPod community lead here.
Bursty with engineering capacity to automate On-demand + scripted stop/start Per-minute or per-second billing matters. GMI Cloud, Lambda, RunPod.
Bursty without ops bandwidth Serverless RunPod serverless is the most mature option.
Long-term capacity planning, 6-12 month commit Reserved CoreWeave and Lambda offer the deepest commit discounts.
Multi-model API needs, not GPU management Managed inference API Skip GPU rental entirely. See GMI Cloud Inference Engine or comparable.

Where GMI Cloud Fits in This Map

GMI Cloud (gmicloud.ai) sits in the lean on-demand H100/H200 lane with per-minute billing. Listed rates: $2.00/hr H100 SXM, $2.60/hr H200 SXM. Node config: 8 GPUs, NVLink 4.0 at 900 GB/s bidirectional aggregate per GPU on HGX, 3.2 Tbps InfiniBand between nodes. Pre-configured stacks include CUDA 12.x, TensorRT-LLM, vLLM, Triton.

Honest positioning: it doesn't currently lead in true scale-to-zero serverless, so RunPod is more mature for highly bursty workloads. CoreWeave's reserved contracts fit enterprise long-term commits. The H100/H200 sticker delta makes the case for steady or scriptable workloads.

Check gmicloud.ai/pricing for current rates.

Frequently Asked Questions

Is the cheapest H100 always the best value? No. Sticker rate ignores billing granularity, cold-start cost, and availability. A $2.00/hr H100 that's available with per-minute billing usually beats a $1.80/hr H100 with hourly billing and tight regional supply. Calculate effective $/GPU-hour after utilization and ops overhead.

When does serverless GPU beat on-demand? Serverless beats on-demand when utilization is under ~20% and your team can't automate instance start/stop. Above 30% steady utilization, on-demand at a lower sticker rate almost always wins. The crossover depends on cold-start budget and idle policy.

Should I commit to a reserved GPU contract? Only if you've proven steady 24/7 demand for 6+ months and have headroom to grow into the commit. Reserved discounts run 30-50% below on-demand on most providers, but unused capacity is a sunk cost. Start on-demand, measure, then commit.

Does GPU rental pricing include networking and storage? Usually not. Egress, persistent storage, and inter-region traffic are billed separately on all four providers covered here. For RAG or multi-modal workloads, network and storage can add 10-20% to the GPU bill. Read each provider's full schedule before forecasting.

Colin Mo

Build AI Without Limits

GMI Cloud helps you architect, deploy, optimize, and scale your AI strategies

Ready to build?

Explore powerful AI models and launch your project in just a few clicks.

Get Started
RunPod vs Lambda vs CoreWeave vs GMI Cloud: What GPU Rental Really Costs for LLM Inference | GMI Cloud