LEAF: Growing Trees Without Branching for Speech-Aware Large Language Model Post-Training
Summary
Low-rank Exploration with Adaptive Forking (LEAF) is a novel retrospective tree-based Reinforcement Learning (RL) method designed for speech-aware large language model post-training. It addresses the limitation of GRPO-style methods, which suffer from coarse credit assignment by broadcasting a single terminal-reward advantage to every token. LEAF recovers useful structure within rollout batches by recognizing shared prefixes in speech-conditioned completions. The method samples complete responses, identifies high-surprisal boundaries, groups responses by these shared prefixes, and assigns span-level advantages using descendant rewards. Empirically, LEAF demonstrates improved performance over GRPO across speech question answering and speech translation benchmarks, utilizing the same rollout and low-rank adaptation budget. Notably, smaller LEAF-trained models surpass existing top-performing, full-parameter baselines.
Key takeaway
For Machine Learning Engineers optimizing speech-aware Large Language Models, you should consider LEAF for post-training to overcome coarse credit assignment issues. This method offers a path to significantly improve performance on tasks like speech question answering and speech translation, achieving superior performance even with smaller models compared to current top-tier full-parameter systems. Evaluate LEAF's tree-based RL approach to potentially reduce computational overhead while boosting accuracy.
Key insights
LEAF is a retrospective tree-based RL method for speech-aware LLM post-training that refines credit assignment.
Principles
- Coarse credit assignment hinders GRPO-style methods.
- Shared prefixes in speech-conditioned completions offer useful structure.
- Span-level advantage assignment improves over token-level.
Method
LEAF samples complete responses, identifies high-surprisal boundaries, groups responses by shared prefixes, and assigns span-level advantages using descendant rewards.
In practice
- Improve speech question answering performance.
- Enhance speech translation benchmarks.
- Achieve superior results with smaller models.
Topics
- Large Language Models
- Reinforcement Learning
- Speech Processing
- LLM Post-Training
- Credit Assignment
- Speech Question Answering
- Speech Translation
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.