Geometry-Preserving Orthonormal Initialization for Low-Rank Adaptation in RLVR

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new study introduces Geometry-Preserving Orthonormal Initialization for Low-Rank Adaptation (LoRA) in Reinforcement Learning with Verifiable Rewards (RLVR), addressing the underperformance and instability of existing LoRA variants like PiSSA and MiLoRA in RLVR settings. While these variants excel in supervised fine-tuning (SFT), their efficacy in RLVR has been unclear. Through theoretical analysis, the research demonstrates that orthonormal initialization achieves the minimal performance gap between LoRA and full fine-tuning outcomes in RLVR. This insight guided the development of two new LoRA variants, RLPO and RLMO, which incorporate geometry-preserving orthonormal initialization. Experiments on mathematical reasoning benchmarks confirm that this proposed initialization method stabilizes RLVR training and consistently outperforms standard LoRA, a finding that contrasts with the behavior of PiSSA and MiLoRA. The analysis also provides a unified explanation for why PiSSA and MiLoRA underperform in RLVR. Code and checkpoints are publicly available.

Key takeaway

For Machine Learning Engineers fine-tuning large language models using Low-Rank Adaptation (LoRA) within Reinforcement Learning with Verifiable Rewards (RLVR) frameworks, you should re-evaluate your initialization strategies. Standard LoRA and SFT-optimized variants like PiSSA or MiLoRA can lead to training instability and suboptimal performance in RLVR. Instead, consider implementing geometry-preserving orthonormal initialization, such as the proposed RLPO or RLMO variants, to achieve more stable training and superior results on mathematical reasoning benchmarks.

Key insights

Orthonormal initialization significantly improves Low-Rank Adaptation (LoRA) stability and performance in Reinforcement Learning with Verifiable Rewards (RLVR).

Principles

Method

A theoretical analysis guides the development of geometry-preserving orthonormal initialization, leading to new LoRA variants, RLPO and RLMO, specifically for RLVR.

In practice

Topics

Code references

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.