Llama-4 Maverick 17B 128E Instruct FP8

An instruction-tuned MoE model with 400B parameters and 17B active across 128 experts. Optimized with FP8 precision, Maverick delivers scalable, efficient reasoning, coding, and dialogue for enterprise-grade deployments.

Try this Model

Model Library

Model Info

Provider

Model Type

LLM

Context Length

1M

Video Quality

Video Length

Capability

Text-Image-to-Text, Multimodal

Serverless

Available

Pricing

$0.25 / $0.8 per 1M input/output tokens

GMI Cloud Features

Serverless

Access your chosen AI model instantly through GMI Cloud’s flexible pay-as-you-go serverless platform. Integrate easily using our Python SDK, REST interface, or any OpenAI-compatible client.

Learn More

State-of-the-Art Model Serving

Experience unmatched inference speed and efficiency with GMI Cloud’s advanced serving architecture. Our platform dynamically scales resources in real time, maintaining peak performance under any workload while optimizing cost and capacity.

Learn More

Dedicated Deployments

Run your chosen AI model on dedicated GPUs reserved exclusively for you. GMI Cloud’s infrastructure provides consistent performance, high availability, and flexible auto-scaling to match your workloads.

Learn More

Try

Llama-4 Maverick 17B 128E Instruct FP8

now.

Try this model now.

Try this Model

Ready to build?

Explore powerful AI models and launch your project in just a few clicks.

Get Started

Get quick answers to common queries in our FAQs.



Llama-4 Maverick 17B 128E Instruct FP8

Provider

Meta

Model Type

LLM

Context Length

1M

Video Quality

Video Length

Capability

Text-Image-to-Text, Multimodal

Serverless

Available

Pricing

$0.25 / $0.8 per 1M input/output tokens

Serverless

State-of-the-Art Model Serving

Dedicated Deployments

Ready to build?

Sign up for our newsletter

Subscribe to our newsletter