Constructive Circuit Amplification: Improving Math Reasoning in LLMs via Targeted Sub-Network Updates

· Source: Apple Machine Learning Research · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Speech and Natural Language Processing · Depth: Advanced, quick

Summary

A novel method called Constructive Circuit Amplification (CCA) has been developed to improve mathematical reasoning in Large Language Models (LLMs) by targeting specific sub-networks. This approach identifies pivotal tokens in reasoning traces and the model components responsible for a desired task, then updates only those components. Applied to mathematical reasoning, CCA enhances accuracy by up to +11.4% across multiple models while modifying as little as 1.59% of model components. Crucially, this targeted intervention has minimal impact on other abilities, as measured by benchmarks like MMLU, TriviaQA, and TruthfulQA, demonstrating that specific capabilities can be reliably improved through sparse, selective updates.

Key takeaway

For research scientists focused on fine-tuning LLMs for specialized tasks, you should consider adopting methods like Constructive Circuit Amplification. This approach allows for significant performance gains in specific domains, such as mathematical reasoning, by making highly targeted updates to a minimal fraction of the model, thereby preserving broader capabilities and reducing the risk of unintended regressions.

Key insights

Targeted sub-network updates can significantly improve specific LLM capabilities with minimal collateral impact.

Principles

Method

Constructive Circuit Amplification identifies pivotal tokens and task-specific model components, then updates only those selected components to enhance performance.

In practice

Topics

Best for: Research Scientist, AI Researcher, AI Scientist, Deep Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Apple Machine Learning Research.