Inference Time Causal Probing in LLMs

2026-05-08 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

Hidden-state Driven Margin Intervention (HDMI) is a new probe-free, gradient-based technique for causal probing in large language models (LLMs) that directly steers hidden states using the model's native output. Unlike existing methods that rely on auxiliary probe classifiers, HDMI applies a margin objective to increase the probability of a target continuation while decreasing that of a source. A lookahead variant, LA-HDMI, is introduced for text editing, which backpropagates through softmax embeddings to modify the current hidden state, increasing the likelihood of user-specified tokens in future generations while maintaining fluency. The reliability of these interventions is evaluated using completeness and selectivity, with their harmonic mean serving as an overall measure. HDMI consistently outperforms prior methods on the LGD agreement corpus and the CausalGym benchmark, tested across Meta-Llama-3-8B-Instruct and Pythia-70M.

Key takeaway

For AI Engineers and Research Scientists working on fine-grained control over LLM behavior, HDMI provides a robust, probe-free alternative to traditional causal probing. This method allows for direct manipulation of hidden states to achieve desired outputs, potentially simplifying the development of more steerable and reliable generative models. Consider integrating HDMI or LA-HDMI into your model development workflows to enhance control over specific model properties and improve text generation quality.

Key insights

HDMI offers a probe-free, gradient-based method for causal probing in LLMs, directly steering hidden states via native model output.

Principles

Interventions should directly use model output.
Reliability requires both completeness and selectivity.

Method

HDMI uses a margin objective to steer hidden states, increasing target continuation probability while decreasing source. LA-HDMI extends this by backpropagating through softmax embeddings for text editing.

In practice

Apply HDMI for targeted LLM behavior modification.
Use LA-HDMI for fluent, user-specified text editing.

Topics

Causal Probing
Hidden-state Driven Margin Intervention
LLM Internal Representations
Gradient-based Interventions
Text Editing

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.