Power Steering: Behavior Steering via Layer-to-Layer Jacobian Singular Vectors
Summary
A new method called "Power Steering" efficiently identifies steering vectors in large language models (LLMs) by analyzing the layer-to-layer Jacobian matrix. This approach, which maps how activations in one layer impact later layers, uses power iteration to find top singular vectors in approximately 15 forward passes per layer pair. This computational efficiency allows for mapping an entire model's sensitivity, as demonstrated on Qwen3-8B and Qwen3-1.7B-Base, revealing directions that influence model behavior. Power Steering achieves comparable performance to more costly non-linear optimization techniques like MELBO for eliciting specific behaviors, such as anti-refusal or chain-of-thought reasoning, particularly in prompts with decision forks or for unlocking latent capabilities. The method also shows that steering vectors often align with the model's natural representations for specific tasks.
Key takeaway
For AI Scientists and Research Scientists exploring LLM interpretability and control, Power Steering offers a computationally efficient alternative to existing methods like MELBO. You should consider using this Jacobian-based approach to map entire models and discover latent behaviors or amplify decision-fork responses. This method's low cost per layer pair makes it practical for comprehensive model analysis, potentially revealing new insights into how LLMs represent and process information, and how to steer them effectively.
Key insights
Power Steering efficiently finds LLM steering vectors by approximating layer-to-layer Jacobians, enabling broad behavioral control.
Principles
- Jacobian singular vectors reveal network sensitivity.
- Linear approximations can match non-linear optimization.
- Steering amplifies existing model computations.
Method
Power Steering computes top singular vectors of the layer-to-layer Jacobian using block power iteration, requiring ~15 forward passes per layer pair. This allows for mapping an entire model's behavioral sensitivity.
In practice
- Map model sensitivity for specific prompts.
- Induce anti-refusal behavior in LLMs.
- Elicit latent chain-of-thought reasoning.
Topics
- LLM Behavior Steering
- Jacobian Singular Vectors
- Power Iteration
- Latent Behavior Elicitation
- Model Sensitivity Mapping
Code references
Best for: AI Scientist, Research Scientist, AI Researcher, Machine Learning Engineer, Prompt Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI Alignment Forum.