Causal Interventions on Continuous Variables: A Case Study on Verb Bias in Steering Vectors for In-Context Learning

2026-05-28 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computational Linguistics · Depth: Expert, quick

Summary

A new method enables causal intervention on continuous variables within language model representations, extending previous work that focused on discrete features like grammatical number. This technique identifies a low-dimensional direction for a graded target variable from activation vectors, then uses this direction to modify vectors towards counterfactual values. Researchers applied this to verb bias, a continuous psycholinguistic feature indicating syntactic structure preferences after a verb. The study demonstrates that verb bias is causally encoded in steering vectors extracted from large language models, with counterfactual edits to verb bias systematically altering downstream structural preferences. While steering vectors also contain error signals potentially driving in-context learning's error-driven updates, these specific aspects are not causally utilized in downstream production. The findings confirm the applicability of causal interventions to continuous variables, though fully linking them to in-context learning remains complex.

Key takeaway

For AI Scientists investigating language model interpretability, this research shows you can causally intervene on continuous internal representations. You should consider applying this method to other graded linguistic or semantic features to understand their influence on model behavior. While connecting these interventions to in-context learning remains challenging, your ability to precisely manipulate features like verb bias offers new avenues for probing model mechanisms. This can improve control over model outputs.

Key insights

Causal interventions can effectively manipulate continuous variables like verb bias in language model steering vectors.

Principles

Verb bias is causally represented in LLM steering vectors.
Counterfactual edits shift downstream structural preferences.

Method

Localize a low-dimensional direction for a graded target variable from activation vectors, then edit vectors towards counterfactual target values using this direction.

In practice

Manipulate continuous linguistic features in LLMs.
Investigate causal links between internal states and output.

Topics

Causal Intervention
Continuous Variables
Language Models
Steering Vectors
Verb Bias
In-Context Learning

Best for: NLP Engineer, AI Scientist, Research Scientist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.