LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method designed specifically for Large Language Models (LLMs). Instead of updating all the model’s weights during training, LoRA freezes the original pre-trained weights and adds a small number of trainable parameters through low-rank matrices inserted into targeted layers (commonly attention and feedforward layers). This approach drastically reduces the number of trainable parameters, enabling:
In technical terms, LoRA decomposes the weight update matrix into the product of two smaller matrices — one with a lower rank — and adds them to the existing weights only during the forward pass. This maintains the expressiveness of the full model while optimizing for efficiency.
LoRA has become a standard method for customizing massive models like GPT, BERT, or LLaMA on domain-specific data without the need to retrain or store the full model for each task.
LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method for LLMs. Instead of updating all weights, it freezes the original pre-trained weights and adds a small set of trainable low-rank parameters in targeted layers. This helps you train faster, use less hardware, and adapt models to multiple tasks more easily.
LoRA decomposes the weight update into the product of two smaller low-rank matrices. These are added during the forward pass on top of the frozen base weights. You keep the expressiveness of the full model while optimizing for efficiency.
LoRA inserts its low-rank adapters into targeted layers, most commonly the attention and feedforward layers, while the base model weights remain frozen.
Because only the small adapters are trained, LoRA delivers faster training times, reduced hardware requirements, and makes it easier to build adaptable multi-task models all without retraining the entire network.
Yes. LoRA is widely used to tailor massive models like GPT, BERT, or LLaMA to domain-specific datasets without retraining or storing the full model for every task.
No. The original weights are frozen. LoRA adds its low-rank parameters on top, so adaptation comes from the additional matrices, not from overwriting the pre-trained weights.