Scaling DoRA: High-Rank Adaptation via Factored Norms and Fused Kernels
Summary
Weight-Decomposed Low-Rank Adaptation (DoRA) is an extension of LoRA that separates weight magnitude from direction, but its standard implementation in frameworks like Hugging Face PEFT requires materializing a dense product (BA) for the row-wise norm of W + sBA. This computation is memory-intensive, consuming approximately 512 MB of transient memory for a single module at d_in = 8192 and rank r = 384 in bf16, making high-rank DoRA impractical on common single-GPU setups. Researchers introduce two system contributions: a factored norm that decomposes the squared norm into O(d_out r + r^2) intermediates, eliminating the dense product, and fused Triton kernels that collapse the four-kernel DoRA composition into a single pass. This fused implementation achieves 1.5-2.0x faster inference and 1.5-1.9x faster gradient computation, with up to 7 GB lower peak VRAM across various 8-32B vision-language models on NVIDIA GPUs.
Key takeaway
For AI Engineers deploying high-rank DoRA models, adopting factored norms and fused Triton kernels is crucial. This approach dramatically reduces peak VRAM by up to 7 GB and accelerates both inference and training by 1.5-2.0x, making previously infeasible high-rank DoRA configurations viable on single-GPU setups. Prioritize integrating these optimizations to enhance performance and resource efficiency.
Key insights
Factored norms and fused kernels significantly optimize high-rank DoRA, reducing memory and improving speed.
Principles
- Decouple weight magnitude from direction.
- Avoid dense product materialization.
- Fuse kernels to reduce memory traffic.
Method
Decompose the squared norm into base, cross, and Gram terms, then collapse the four-kernel DoRA composition into a single, numerically stable Triton kernel.
In practice
- Use factored norms for DoRA.
- Implement fused Triton kernels.
- Apply to vision-language models.
Topics
- DoRA
- Low-Rank Adaptation
- Memory Optimization
- Fused Kernels
- Vision-Language Models
Best for: AI Scientist, Research Scientist, AI Engineer, AI Researcher, Machine Learning Engineer, Deep Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.