LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method designed specifically for Large Language Models (LLMs). Instead of updating all the model’s weights during training, LoRA freezes the original pre-trained weights and adds a small number of trainable parameters through low-rank matrices inserted into targeted layers (commonly attention and feedforward layers). This approach drastically reduces the number of trainable parameters, enabling:
In technical terms, LoRA decomposes the weight update matrix into the product of two smaller matrices — one with a lower rank — and adds them to the existing weights only during the forward pass. This maintains the expressiveness of the full model while optimizing for efficiency.
LoRA has become a standard method for customizing massive models like GPT, BERT, or LLaMA on domain-specific data without the need to retrain or store the full model for each task.
GPU クラウドの即時アクセスで、
人類の AI への挑戦を加速する。
2860 Zanker Rd. Suite 100 San Jose, CA 95134
GMI Cloud
278 Castro St, Mountain View, CA 94041
Taiwan Office
GMI Computing International Ltd., Taiwan Branch
6F, No. 618, Ruiguang Rd., Neihu District, Taipei City 114726, Taiwan
Singapore Office
GMI Computing International Pte. Ltd.
1 Raffles Place, #21-01, One Raffles Place, Singapore 048616

