Hidden Thoughts Are Not Secret: Reasoning Trace Exposure in LLMs
Summary
Reasoning Exposure Prompting (REP) is introduced as a lightweight in-context elicitation method designed to reveal hidden internal reasoning traces from large language models (LLMs). Many deployed LLM systems conceal these valuable traces, which are crucial for tasks like distilling capabilities from stronger teacher models to weaker student models, often exposing only summaries or final answers. REP addresses this by employing shadow-model-generated demonstrations, formatted in auxiliary code-like structures, to make user-visible reasoning traces from a "victim model." Experiments across common reasoning datasets, various victim models, and different student model distillation scenarios demonstrate that REP substantially increases the similarity between the exposed traces and the REP-conditioned internal traces, critically preserving useful reasoning signals.
Key takeaway
For Machine Learning Engineers focused on LLM distillation or understanding model behavior, accessing internal reasoning traces is crucial, even when deployed systems hide them. You should consider implementing Reasoning Exposure Prompting (REP) to surface these hidden traces. By using shadow-model-generated demonstrations wrapped in auxiliary code-like formats, you can substantially increase the similarity between exposed and internal traces, preserving valuable reasoning signals for improving student model learning and capability transfer.
Key insights
Reasoning Exposure Prompting (REP) reveals hidden LLM internal traces, preserving valuable reasoning signals for model distillation.
Principles
- Internal reasoning traces are valuable learning signals.
- Interface-level trace hiding may prevent useful supervision.
- Elicitation methods can expose hidden model behaviors.
Method
Reasoning Exposure Prompting (REP) uses shadow-model-generated demonstrations in code-like formats for in-context elicitation of user-visible reasoning traces.
In practice
- Use REP to expose LLM reasoning traces.
- Apply REP for distilling teacher model capabilities.
- Employ code-like formats for trace elicitation.
Topics
- Large Language Models
- Reasoning Traces
- Model Distillation
- Reasoning Exposure Prompting
- In-context Learning
- Capability Transfer
Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.