Your Mouse and Eyes Secretly Leak Your Preference: LLM Alignment using Implicit Feedback from Users

2026-06-18 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

A new research initiative introduces a novel approach to Large Language Model (LLM) alignment by leveraging implicit human feedback, such as mouse trajectories and eye-gazing points, to overcome the limitations of costly explicit feedback collection. Researchers developed the IFLLM dataset, comprising 1336 multi-turn questions, mouse trajectories, and eye-gazing data from 59 Mechanical Turk workers interacting with LLM responses. Analysis of IFLLM revealed diverse user gazing behaviors and mouse movements. A reward model trained on this implicit feedback significantly boosted the accuracy of text-based reward models from 55% to 64%. Furthermore, applying Direct Preference Optimization (DPO) with this implicit feedback nearly tripled the relative response quality improvements across eight different LLMs, demonstrating the practical value of this approach.

Key takeaway

For Machine Learning Engineers focused on LLM alignment, integrate implicit user feedback mechanisms into your evaluation pipelines. Utilizing mouse trajectories and eye-gazing data, as shown by IFLLM, significantly boosts reward model accuracy from 55% to 64%. This approach also triples response quality improvements via DPO. It offers a cost-effective alternative to expensive explicit feedback for robust LLM fine-tuning.

Key insights

Implicit user feedback (mouse/gaze) significantly enhances LLM alignment, outperforming explicit text-based methods.

Principles

Implicit user signals reveal true preferences.
Diverse user interaction patterns exist.
Costly explicit feedback is not always necessary.

Method

Collect multi-turn questions, mouse trajectories, and eye-gazing points from users interacting with LLM responses to build a preference dataset. Train a reward model on this implicit data.

In practice

Integrate mouse/gaze tracking into LLM UIs.
Use IFLLM dataset for reward model training.
Apply DPO with implicit feedback.

Topics

LLM Alignment
Implicit Feedback
Reward Models
Direct Preference Optimization
Human-Computer Interaction
User Preference Learning

Code references

themehulpatwari/llm-implicit-feedback

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.