LoRA LLM
LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method designed specifically for Large Language Models (LLMs). Instead of updating all the model's weights during training, LoRA freezes the original pre-trained weights and adds a small number of trainable parameters through low-rank matrices inserted into targeted layers (commonly attention and feedforward layers).
LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method designed specifically for Large Language Models (LLMs). Instead of updating all the model's weights during training, LoRA freezes the original pre-trained weights and adds a small number of trainable parameters through low-rank matrices inserted into targeted layers (commonly attention and feedforward layers).
This approach enables faster training times, reduced hardware requirements, and more adaptable multi-task models.
LoRA decomposes the weight update into the product of two smaller low-rank matrices. These are added during the forward pass on top of the frozen base weights, maintaining model expressiveness while optimizing efficiency.
LoRA has become a standard method for customizing massive models like GPT, BERT, or LLaMA on domain-specific data.