Related terms

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method designed specifically for Large Language Models (LLMs). Instead of updating all the model’s weights during training, LoRA freezes the original pre-trained weights and adds a small number of trainable parameters through low-rank matrices inserted into targeted layers (commonly attention and feedforward layers). This approach drastically reduces the number of trainable parameters, enabling:

Faster training times
Reduced hardware requirements
More adaptable multi-task models

In technical terms, LoRA decomposes the weight update matrix into the product of two smaller matrices — one with a lower rank — and adds them to the existing weights only during the forward pass. This maintains the expressiveness of the full model while optimizing for efficiency.

LoRA has become a standard method for customizing massive models like GPT, BERT, or LLaMA on domain-specific data without the need to retrain or store the full model for each task.

LoRA LLM

Sign up for our newsletter

Subscribe to our newsletter