Response Time Enhances Alignment with Heterogeneous Preferences

2026-05-11 · Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

A new study by Baihe Huang and Michael I. Jordan, published on May 7, 2026, introduces a novel method to enhance the alignment of large language models (LLMs) with human preferences by incorporating user response times. Current LLM alignment techniques, such as Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO), typically aggregate diverse feedback into a single reward model, assuming uniform preferences among anonymous labelers. This assumption, however, distorts the learned policy and makes the true population-average preference unidentifiable. The researchers propose augmenting preference datasets with response time data, which is "essentially free to record" and requires "zero user tracking or identification." By modeling each decision as a Drift-Diffusion Model (DDM), they developed a consistent estimator for heterogeneous preferences that corrects distortions from choice-only labels. Empirical validation on synthetic and real-world datasets demonstrates that their method consistently outperforms standard baselines, which often plateau at a bias floor, and achieves higher accuracy in recovering the population-average preference.

Key takeaway

For research scientists developing LLM alignment strategies, integrating user response times into preference datasets is critical. This approach, based on the Drift-Diffusion Model, offers a provably consistent estimator for population-average preferences, overcoming the inherent bias of choice-only methods when dealing with heterogeneous labelers. You should consider instrumenting data collection pipelines to capture response times, as this "free signal" significantly improves the accuracy of learned preference models without requiring user identification, thereby enhancing the social benefit of LLMs.

Key insights

Response time data can restore identifiability of population-average preferences in LLM alignment, even with heterogeneous, anonymous labelers.

Principles

Heterogeneous preferences distort standard choice-only alignment.
Response time correlates with utility intensity in decision-making.
Utilitarian aggregation requires measuring preference intensity.

Method

The method models decisions using a Drift-Diffusion Model (DDM) to jointly estimate a common, unknown decision boundary and the population-average drift, then applies a Richardson-extrapolated estimator for consistency.

In practice

Record response times in future LLM preference datasets.
Implement DDM-based estimators for more accurate preference learning.
Use Richardson extrapolation for robust boundary estimation.

Topics

Large Language Model Alignment
Heterogeneous Preferences
Drift-Diffusion Model
Response Time Data
Population-Average Preference

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.