Power Steering: Behavior Steering via Layer-to-Layer Jacobian Singular Vectors

· Source: AI Alignment Forum · Field: Technology & Digital — Artificial Intelligence & Machine Learning, AI Alignment · Depth: Advanced, extended

Summary

A new method called "Power Steering" efficiently identifies steering vectors in large language models (LLMs) by analyzing the layer-to-layer Jacobian matrix. This approach, which maps how activations in one layer impact later layers, uses power iteration to find top singular vectors in approximately 15 forward passes per layer pair. This computational efficiency allows for mapping an entire model's sensitivity, as demonstrated on Qwen3-8B and Qwen3-1.7B-Base, revealing directions that influence model behavior. Power Steering achieves comparable performance to more costly non-linear optimization techniques like MELBO for eliciting specific behaviors, such as anti-refusal or chain-of-thought reasoning, particularly in prompts with decision forks or for unlocking latent capabilities. The method also shows that steering vectors often align with the model's natural representations for specific tasks.

Key takeaway

For AI Scientists and Research Scientists exploring LLM interpretability and control, Power Steering offers a computationally efficient alternative to existing methods like MELBO. You should consider using this Jacobian-based approach to map entire models and discover latent behaviors or amplify decision-fork responses. This method's low cost per layer pair makes it practical for comprehensive model analysis, potentially revealing new insights into how LLMs represent and process information, and how to steer them effectively.

Key insights

Power Steering efficiently finds LLM steering vectors by approximating layer-to-layer Jacobians, enabling broad behavioral control.

Principles

Method

Power Steering computes top singular vectors of the layer-to-layer Jacobian using block power iteration, requiring ~15 forward passes per layer pair. This allows for mapping an entire model's behavioral sensitivity.

In practice

Topics

Code references

Best for: AI Scientist, Research Scientist, AI Researcher, Machine Learning Engineer, Prompt Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI Alignment Forum.