Select to Think: Unlocking SLM Potential with Local Sufficiency
Summary
Small language models (SLMs) typically lack the reasoning capabilities of larger language models (LLMs) despite their computational efficiency. Existing methods to bridge this gap, such as invoking an LLM for token generation at reasoning divergence points, incur significant latency and cost. Standard distillation techniques are often limited by SLM capacity, making it difficult for them to replicate complex LLM generative distributions. Researchers identified "local sufficiency," where an LLM's preferred token is frequently found within an SLM's top-K next-token predictions, even if not the SLM's top-1 choice. This led to the development of SELECT TO THINK (S2T), which redefines the LLM's role to selecting from SLM proposals, simplifying the supervision signal to discrete candidate rankings. S2T-LOCAL distills this selection logic into the SLM, enabling autonomous re-ranking without requiring an LLM during inference. A 1.5B SLM achieved a 95% hit rate for a 32B LLM's choice within its top-8 candidates, and S2T-LOCAL improved greedy decoding by 24.1% on average, matching 8-path self-consistency with single-trajectory efficiency.
Key takeaway
For AI Engineers optimizing SLM deployment for reasoning tasks, S2T-LOCAL offers a significant performance boost without the latency and cost of external LLM calls. You can achieve a 24.1% average improvement in greedy decoding, effectively matching the efficacy of 8-path self-consistency with single-trajectory efficiency. Consider integrating S2T-LOCAL to enhance SLM reasoning capabilities while maintaining computational efficiency.
Key insights
SLMs can achieve LLM-like reasoning by re-ranking their own top-K predictions, guided by distilled LLM selection logic.
Principles
- LLM choices often reside within SLM top-K predictions.
- Distilling selection logic is more effective than generative distribution.
Method
SELECT TO THINK (S2T) reframes LLM supervision from open-ended generation to discrete candidate ranking, distilling this selection logic into the SLM for autonomous re-ranking.
In practice
- Use S2T-LOCAL for SLM reasoning tasks.
- Explore top-K candidate re-ranking for efficiency.
Topics
- Small Language Models
- Large Language Models
- Local Sufficiency
- SELECT TO THINK (S2T)
- Model Distillation
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.