Your Mouse and Eyes Secretly Leak Your Preference: LLM Alignment using Implicit Feedback from Users
Summary
Researchers introduced IFllm, a novel dataset designed to align Large Language Models (LLMs) using implicit human feedback. This dataset comprises 1336 multi-turn questions collected from 59 Mechanical Turk workers, recording their mouse trajectories and webcam-based eye-gazing points while interacting with LLM responses. Unlike explicit feedback, which is scarce and costly, implicit signals like mouse movement and eye gaze offer a scalable alternative. Analysis of IFllm revealed diverse user reading patterns influenced by response length and interface layout. A reward model trained on this implicit feedback significantly boosted preference prediction accuracy from 55% to 64% compared to text-based models. Furthermore, applying Direct Preference Optimization (DPO) with this implicit feedback-driven reward model nearly tripled the relative response quality improvements from 0.12 to 0.35 across eight LLMs, demonstrating the substantial value of these signals, particularly mouse movement for longer responses.
Key takeaway
For Machine Learning Engineers developing user-facing LLMs, integrating implicit feedback mechanisms is crucial for scalable alignment. Your current explicit feedback methods are likely insufficient; instead, consider deploying mouse trajectory tracking in your LLM interfaces. This approach can significantly improve reward model accuracy and subsequently enhance LLM response quality, especially for longer outputs, fostering a positive feedback loop for user satisfaction. However, ensure robust privacy protocols are in place to mitigate ethical concerns regarding user tracking.
Key insights
Implicit user feedback, especially mouse movement, significantly enhances LLM alignment and preference prediction accuracy.
Principles
- User reading patterns are diverse and length-dependent.
- Mouse movement is crucial for longer response preference.
- Gaze signals are more helpful for short responses.
Method
Collect webcam-based eye-gaze and mouse trajectories during LLM interactions, extract features, then train a random forest reward model to predict user preference.
In practice
- Integrate mouse tracking into LLM interfaces for feedback.
- Use implicit signals to refine DPO for better LLM alignment.
- Develop length-aware feedback mechanisms for LLMs.
Topics
- LLM Alignment
- Implicit Feedback
- Reward Models
- Mouse Tracking
- Eye Tracking
- Data Collection
- Direct Preference Optimization
Code references
Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.