H-Probes: Extracting Hierarchical Structures From Latent Representations of Language Models

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

Researchers developed H-probes, a set of linear probes designed to extract hierarchical structure, specifically tree depth and pairwise distance, from the latent representations of large language models (LLMs). Experiments on synthetic binary tree traversal tasks demonstrated that H-probes robustly identify low-dimensional subspaces containing hierarchical structure. These subspaces are causally important for high task performance, generalize both within- and out-of-domain (to deeper trees), and remain stable across different training sets and model scales (1.5B, 7B, 14B Qwen reasoning models). Analogous, though weaker, hierarchical structures were also found in real-world contexts like mathematical reasoning traces (GSM8K) and HiBench tasks. The findings suggest that LLMs represent hierarchy not only at syntactic and conceptual levels but also at deeper levels of abstraction, including the reasoning process itself.

Key takeaway

For research scientists investigating LLM interpretability, understanding how models represent hierarchical reasoning is crucial. This work demonstrates that hierarchical structures are geometrically encoded in low-dimensional latent subspaces and are causally linked to task performance. You should consider employing probing frameworks like H-probes to identify and analyze these structures, especially when working with models performing complex, multi-step tasks, to gain insights into their internal computational mechanisms and improve alignment and control.

Key insights

LLMs geometrically represent hierarchical structures like tree depth and pairwise distance in low-dimensional latent subspaces.

Principles

Method

H-probes use PCA-reduced latent space to train linear probes for tree distance (Euclidean distance in a projected subspace) and tree depth (ridge regression on a linear direction), followed by causal ablation experiments.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.