LEAF: Growing Trees Without Branching for Speech-Aware Large Language Model Post-Training

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

Low-rank Exploration with Adaptive Forking (LEAF) is a novel retrospective tree-based Reinforcement Learning (RL) method designed for speech-aware large language model post-training. It addresses the limitation of GRPO-style methods, which suffer from coarse credit assignment by broadcasting a single terminal-reward advantage to every token. LEAF recovers useful structure within rollout batches by recognizing shared prefixes in speech-conditioned completions. The method samples complete responses, identifies high-surprisal boundaries, groups responses by these shared prefixes, and assigns span-level advantages using descendant rewards. Empirically, LEAF demonstrates improved performance over GRPO across speech question answering and speech translation benchmarks, utilizing the same rollout and low-rank adaptation budget. Notably, smaller LEAF-trained models surpass existing top-performing, full-parameter baselines.

Key takeaway

For Machine Learning Engineers optimizing speech-aware Large Language Models, you should consider LEAF for post-training to overcome coarse credit assignment issues. This method offers a path to significantly improve performance on tasks like speech question answering and speech translation, achieving superior performance even with smaller models compared to current top-tier full-parameter systems. Evaluate LEAF's tree-based RL approach to potentially reduce computational overhead while boosting accuracy.

Key insights

LEAF is a retrospective tree-based RL method for speech-aware LLM post-training that refines credit assignment.

Principles

Method

LEAF samples complete responses, identifies high-surprisal boundaries, groups responses by shared prefixes, and assigns span-level advantages using descendant rewards.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.