Your Mouse and Eyes Secretly Leak Your Preference: LLM Alignment using Implicit Feedback from Users

2026-06-19 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Emerging Technologies & Innovation · Depth: Expert, extended

Summary

Researchers introduced IFllm, a novel dataset designed to align Large Language Models (LLMs) using implicit human feedback. This dataset comprises 1336 multi-turn questions collected from 59 Mechanical Turk workers, recording their mouse trajectories and webcam-based eye-gazing points while interacting with LLM responses. Unlike explicit feedback, which is scarce and costly, implicit signals like mouse movement and eye gaze offer a scalable alternative. Analysis of IFllm revealed diverse user reading patterns influenced by response length and interface layout. A reward model trained on this implicit feedback significantly boosted preference prediction accuracy from 55% to 64% compared to text-based models. Furthermore, applying Direct Preference Optimization (DPO) with this implicit feedback-driven reward model nearly tripled the relative response quality improvements from 0.12 to 0.35 across eight LLMs, demonstrating the substantial value of these signals, particularly mouse movement for longer responses.

Key takeaway

For Machine Learning Engineers developing user-facing LLMs, integrating implicit feedback mechanisms is crucial for scalable alignment. Your current explicit feedback methods are likely insufficient; instead, consider deploying mouse trajectory tracking in your LLM interfaces. This approach can significantly improve reward model accuracy and subsequently enhance LLM response quality, especially for longer outputs, fostering a positive feedback loop for user satisfaction. However, ensure robust privacy protocols are in place to mitigate ethical concerns regarding user tracking.

Key insights

Implicit user feedback, especially mouse movement, significantly enhances LLM alignment and preference prediction accuracy.

Principles

User reading patterns are diverse and length-dependent.
Mouse movement is crucial for longer response preference.
Gaze signals are more helpful for short responses.

Method

Collect webcam-based eye-gaze and mouse trajectories during LLM interactions, extract features, then train a random forest reward model to predict user preference.

In practice

Integrate mouse tracking into LLM interfaces for feedback.
Use implicit signals to refine DPO for better LLM alignment.
Develop length-aware feedback mechanisms for LLMs.

Topics

LLM Alignment
Implicit Feedback
Reward Models
Mouse Tracking
Eye Tracking
Data Collection
Direct Preference Optimization

Code references

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.