Hidden Thoughts Are Not Secret: Reasoning Trace Exposure in LLMs

2026-05-30 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

Reasoning Exposure Prompting (REP) is introduced as a lightweight in-context elicitation method designed to reveal hidden internal reasoning traces from large language models (LLMs). Many deployed LLM systems conceal these valuable traces, which are crucial for tasks like distilling capabilities from stronger teacher models to weaker student models, often exposing only summaries or final answers. REP addresses this by employing shadow-model-generated demonstrations, formatted in auxiliary code-like structures, to make user-visible reasoning traces from a "victim model." Experiments across common reasoning datasets, various victim models, and different student model distillation scenarios demonstrate that REP substantially increases the similarity between the exposed traces and the REP-conditioned internal traces, critically preserving useful reasoning signals.

Key takeaway

For Machine Learning Engineers focused on LLM distillation or understanding model behavior, accessing internal reasoning traces is crucial, even when deployed systems hide them. You should consider implementing Reasoning Exposure Prompting (REP) to surface these hidden traces. By using shadow-model-generated demonstrations wrapped in auxiliary code-like formats, you can substantially increase the similarity between exposed and internal traces, preserving valuable reasoning signals for improving student model learning and capability transfer.

Key insights

Reasoning Exposure Prompting (REP) reveals hidden LLM internal traces, preserving valuable reasoning signals for model distillation.

Principles

Internal reasoning traces are valuable learning signals.
Interface-level trace hiding may prevent useful supervision.
Elicitation methods can expose hidden model behaviors.

Method

Reasoning Exposure Prompting (REP) uses shadow-model-generated demonstrations in code-like formats for in-context elicitation of user-visible reasoning traces.

In practice

Use REP to expose LLM reasoning traces.
Apply REP for distilling teacher model capabilities.
Employ code-like formats for trace elicitation.

Topics

Large Language Models
Reasoning Traces
Model Distillation
Reasoning Exposure Prompting
In-context Learning
Capability Transfer

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.