Linear Probes Detect Task Format, Not Reasoning Mode in Language Model Hidden States

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

A study on Qwen3-14B challenges the common interpretation of linear probes in large language models, finding that high probe accuracy for reasoning types reflects task format rather than distinct internal computational structures. Researchers probed hidden states of Qwen3-14B on LogiQA 2.0 (deductive), ARC-Challenge (inductive), and αNLI (abductive) benchmarks. Initially, linear probes achieved 100% cross-validated accuracy at layer 32, showing distinct manifold geometry with intrinsic dimensionalities of 20.6, 28.5, and 33.6. However, a four-stage confound analysis, including residualizing format features like source identity, option count, and response length, reduced probe accuracy to chance level. Furthermore, trace-anchor similarity indicated only 42.5% agreement with intended reasoning modes, suggesting a uniform reasoning strategy. Causal steering experiments with random controls (n=20) yielded a p=0.286, confirming no functional link between the observed geometry and reasoning mode selection.

Key takeaway

For AI Scientists and Machine Learning Engineers interpreting LLM internal states, you must critically re-evaluate linear probing results. Your high probe accuracy for reasoning modes may merely reflect task format differences, not genuine computational distinctions. To avoid misinterpreting model capabilities, integrate format deconfounding, such as residual analysis, and random-direction controls into your interpretability pipelines. This ensures you are detecting functional structure, not superficial artifacts, guiding more accurate model development and evaluation.

Key insights

Linear probes in LLMs often detect task format confounds, not distinct reasoning mode representations, challenging common interpretability claims.

Principles

Method

A four-stage pipeline: multi-source dataset construction, hidden-state extraction, layer-wise linear probing, format confound analysis (residualization), and causal steering with random-direction controls.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.