ST-DiffEye: Diffusion-based Continuous Gaze Generation via Joint Scanpath-Trajectory Modeling
Summary
ST-DiffEye is a diffusion-based framework designed for continuous human gaze generation, modeling the patterns a viewer produces when observing visual stimuli. It uniquely addresses gaze variability as an intrinsic property, not noise. Unlike existing models that supervise on either continuous eye-tracking trajectories or discrete scanpaths in isolation, ST-DiffEye jointly models both complementary modalities. This is achieved by concatenating them as an additional raw input channel, requiring minimal architectural overhead. The framework also introduces a principled evaluation method, the Continuous Ranked Probability Score (CRPS), which generalizes existing sequence similarity metrics to assess both accuracy and diversity. Experiments confirm ST-DiffEye achieves state-of-the-art performance on task-driven visual search (target-present and target-absent) and free-viewing benchmarks.
Key takeaway
For Computer Vision Engineers or AI Scientists developing realistic human behavior models, ST-DiffEye demonstrates that jointly modeling continuous eye-tracking trajectories and discrete scanpaths significantly enhances gaze generation accuracy and diversity. You should consider integrating multi-modal gaze data into your generative frameworks and adopt distribution-aware metrics like CRPS for robust evaluation, especially when intrinsic variability is critical. This approach can improve synthetic data realism for training or simulation.
Key insights
ST-DiffEye jointly models gaze trajectories and scanpaths via diffusion to generate diverse, accurate human gaze patterns.
Principles
- Gaze variability is a defining property.
- Trajectories and scanpaths are complementary.
- Distribution-aware evaluation is crucial.
Method
ST-DiffEye couples gaze trajectories and scanpaths by concatenating them as an additional raw input channel within a diffusion framework, expanding input/output channels without significant architectural overhead.
In practice
- Generate diverse gaze for visual search.
- Simulate gaze in free-viewing scenarios.
- Evaluate gaze models with CRPS.
Topics
- Gaze Modeling
- Diffusion Models
- Eye-tracking
- Scanpath Analysis
- Generative AI
- CRPS Evaluation
- Computer Vision
Best for: Research Scientist, AI Scientist, Computer Vision Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.