Bayesian-LoRA: Probabilistic Low-Rank Adaptation of Large Language Models

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Expert, extended

Summary

Bayesian-LoRA is a novel parameter-efficient fine-tuning method for Large Language Models (LLMs) that reformulates the deterministic LoRA update as a probabilistic low-rank representation, inspired by Sparse Gaussian Processes. This approach addresses the issue of LLMs, especially when fine-tuned on small datasets, tending towards miscalibration and overconfidence. Bayesian-LoRA introduces a low-dimensional inducing matrix $U$ and enriches its variational posterior with a normalizing flow, allowing for end-to-end optimization of calibration during training. It achieves significant improvements in calibration, with up to 84% ECE reduction and 76% NLL reduction, while maintaining competitive accuracy across models up to 30B parameters. The method requires approximately 0.42M additional parameters and incurs about 1.2x training cost relative to standard LoRA, making it efficient for large-scale LLM adaptation.

Key takeaway

For AI Engineers and Research Scientists deploying LLMs in high-stakes environments, Bayesian-LoRA offers a robust solution to address model overconfidence and improve calibration. By adopting this method, you can achieve significantly better uncertainty quantification and out-of-distribution robustness compared to standard LoRA or post-hoc calibration techniques, with only a modest increase in training cost and parameter count. Consider integrating Bayesian-LoRA to enhance the trustworthiness and reliability of your fine-tuned LLMs, especially when working with limited datasets or facing distributional shifts.

Key insights

Bayesian-LoRA improves LLM calibration and uncertainty by integrating probabilistic low-rank adaptation with normalizing flows.

Principles

Method

Bayesian-LoRA replaces deterministic LoRA updates with a stochastic formulation using a low-dimensional inducing matrix $U$ and a flow-augmented variational posterior, optimizing a closed-form ELBO for calibration-aware training.

In practice

Topics

Code references

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.