Get quick answers to common queries in our FAQs.
It’s a versatile, efficient dense LLM that delivers strong reasoning and multi-turn capabilities through a hybrid architecture with thoughtfully optimized FP8 precision—aimed at high quality with efficiency.
Pricing on the page is $0.1 / $0.6 per 1M input/output tokens.
The listed capability is Text-to-Text for LLM tasks—well-suited to chat and multi-turn reasoning where you need coherent text generation from text prompts.
The model is available in Serverless form on GMI Cloud’s pay-as-you-go platform, accessible via Python SDK, REST interface, or any OpenAI-compatible client.
Yes—GMI offers Dedicated Deployments on dedicated GPUs reserved for you, providing consistent performance, high availability, and flexible auto-scaling for your workloads.
GMI’s serving layer advertises state-of-the-art model serving, dynamically scaling resources in real time to maintain peak performance under any workload while optimizing cost and capacity.