• GPU Instances
  • Cluster Engine
  • Application Platform
  • NVIDIA H200
  • NVIDIA GB200 NVL72
  • Products
    
    GPU InstancesCluster EngineInference EngineApplication Platform
  • GPUs
    
    H200NVIDIA GB200 NVL72NVIDIA HGX™ B200
  • Pricing
  • Company
    
    About usBlogDiscoursePartnersCareers
  • About Us
  • Blog
  • Discourse
  • Partners
  • Contact Us
  • Get started
English
English

English
日本語
한국어
繁體中文
Get startedContact Sales

LoRA LLM

Get startedfeatures

Related terms

Deep Learning
Large Language Model (LLM)
BACK TO GLOSSARY

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method designed specifically for Large Language Models (LLMs). Instead of updating all the model’s weights during training, LoRA freezes the original pre-trained weights and adds a small number of trainable parameters through low-rank matrices inserted into targeted layers (commonly attention and feedforward layers). This approach drastically reduces the number of trainable parameters, enabling:

  • Faster training times
  • Reduced hardware requirements
  • More adaptable multi-task models

In technical terms, LoRA decomposes the weight update matrix into the product of two smaller matrices — one with a lower rank — and adds them to the existing weights only during the forward pass. This maintains the expressiveness of the full model while optimizing for efficiency.

LoRA has become a standard method for customizing massive models like GPT, BERT, or LLaMA on domain-specific data without the need to retrain or store the full model for each task.

Sign up for our newsletter

Empowering humanity's AI ambitions with instant GPU cloud access.

[email protected]

278 Castro St, Mountain View, CA 94041

  • GPU Cloud
  • Cluster Engine
  • Inference Engine
  • Pricing
  • Glossary
  • About Us
  • Blog
  • Partners
  • Careers
  • Contact Us

© 2025 All Rights Reserved.

Privacy Policy

Terms of Use