Fast Human Attention Prediction for Fixation-guided Active Perception in Autonomous Navigation

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Robotics & Autonomous Systems, Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

GazeLNN is a new, computationally lightweight scanpath prediction model designed to address the high computational costs of existing human visual attention models for robot autonomy. It leverages Liquid Neural Networks as its recurrent engine and MobileNetV3 for feature extraction, operating auto-regressively to predict sequential fixation heatmaps based on visual stimuli and fixation history. Despite requiring only 0.61 GFLOPs, GazeLNN achieves a 0.47 ScanMatch score on the MIT Low Resolution dataset, demonstrating leading performance. It significantly outperforms recurrent baselines across various metrics, reducing computational costs by 99.40% and accelerating inference by up to six times. The model has been successfully integrated into an active camera-robot control policy, trained via Reinforcement Learning, enabling human-fixation-guided perception during autonomous navigation, and validated through real-world deployments on an aerial robot.

Key takeaway

For Robotics Engineers developing autonomous navigation systems, GazeLNN offers a critical advancement in efficient human attention modeling. You should consider integrating this lightweight architecture to enable fixation-guided active perception, significantly reducing computational overhead by 99.40% and accelerating inference by six times compared to existing recurrent baselines. This allows for more responsive and human-like visual processing in real-world robot deployments, enhancing navigation capabilities without prohibitive resource demands.

Key insights

GazeLNN offers a lightweight, high-performance model for human attention prediction, enabling efficient fixation-guided active perception in robots.

Principles

Method

GazeLNN employs Liquid Neural Networks and MobileNetV3 to auto-regressively predict sequential fixation heatmaps, conditioned on current visual input and fixation history.

In practice

Topics

Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Robotics Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.