Response Time Enhances Alignment with Heterogeneous Preferences
Summary
A new study by Baihe Huang and Michael I. Jordan, published on May 7, 2026, introduces a novel method to enhance the alignment of large language models (LLMs) with human preferences by incorporating user response times. Current LLM alignment techniques, such as Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO), typically aggregate diverse feedback into a single reward model, assuming uniform preferences among anonymous labelers. This assumption, however, distorts the learned policy and makes the true population-average preference unidentifiable. The researchers propose augmenting preference datasets with response time data, which is "essentially free to record" and requires "zero user tracking or identification." By modeling each decision as a Drift-Diffusion Model (DDM), they developed a consistent estimator for heterogeneous preferences that corrects distortions from choice-only labels. Empirical validation on synthetic and real-world datasets demonstrates that their method consistently outperforms standard baselines, which often plateau at a bias floor, and achieves higher accuracy in recovering the population-average preference.
Key takeaway
For research scientists developing LLM alignment strategies, integrating user response times into preference datasets is critical. This approach, based on the Drift-Diffusion Model, offers a provably consistent estimator for population-average preferences, overcoming the inherent bias of choice-only methods when dealing with heterogeneous labelers. You should consider instrumenting data collection pipelines to capture response times, as this "free signal" significantly improves the accuracy of learned preference models without requiring user identification, thereby enhancing the social benefit of LLMs.
Key insights
Response time data can restore identifiability of population-average preferences in LLM alignment, even with heterogeneous, anonymous labelers.
Principles
- Heterogeneous preferences distort standard choice-only alignment.
- Response time correlates with utility intensity in decision-making.
- Utilitarian aggregation requires measuring preference intensity.
Method
The method models decisions using a Drift-Diffusion Model (DDM) to jointly estimate a common, unknown decision boundary and the population-average drift, then applies a Richardson-extrapolated estimator for consistency.
In practice
- Record response times in future LLM preference datasets.
- Implement DDM-based estimators for more accurate preference learning.
- Use Richardson extrapolation for robust boundary estimation.
Topics
- Large Language Model Alignment
- Heterogeneous Preferences
- Drift-Diffusion Model
- Response Time Data
- Population-Average Preference
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.