DSO: Direct Steering Optimization for Bias Mitigation

2026-04-29 · Source: Apple Machine Learning Research · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

Direct Steering Optimization (DSO) is a novel method designed to mitigate bias in generative models, specifically Vision-Language Models (VLMs) and Large Language Models (LLMs), while allowing for controllable performance trade-offs during inference. Generative models often exhibit biases, such as VLMs failing to identify women as doctors, due to demographic attributes in input data. Existing activation steering techniques, while useful for inducing safer LLM behavior, are insufficient for achieving equiprobable outcomes across demographic groups. DSO addresses this by employing reinforcement learning to discover optimal linear transformations for steering activations. This approach enables state-of-the-art balance between fairness and model capabilities, offering practitioners fine-grained control over this trade-off at inference time. The work emphasizes the advantages of directly optimizing steering strategies for behavioral control over heuristic-based methods.

Key takeaway

For AI engineers deploying generative models where demographic biases are a concern, you should consider integrating Direct Steering Optimization (DSO). This method provides a robust way to mitigate biases in VLMs and LLMs, offering crucial inference-time control over the fairness-performance trade-off. Implementing DSO can lead to more equitable model outcomes without sacrificing overall model capabilities, enhancing responsible AI deployment.

Key insights

DSO uses reinforcement learning to optimize activation steering for controllable bias mitigation in generative models.

Principles

Bias mitigation requires equiprobable outcomes.
Direct optimization improves steering effectiveness.

Method

DSO applies reinforcement learning to find linear transformations for steering activations, directly optimizing for bias mitigation while preserving control over model performance.

In practice

Apply DSO for VLM and LLM bias reduction.
Control fairness-capability trade-off at inference.

Topics

Direct Steering Optimization
Bias Mitigation
Activation Steering
Reinforcement Learning
Vision-Language Models

Code references

apple/ml-dso

Best for: AI Engineer, NLP Engineer, Computer Vision Engineer, AI Scientist, Machine Learning Engineer, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Apple Machine Learning Research.