Fast Human Attention Prediction for Fixation-guided Active Perception in Autonomous Navigation
Summary
GazeLNN is a new, computationally lightweight scanpath prediction model designed to address the high computational costs of existing human visual attention models for robot autonomy. It leverages Liquid Neural Networks as its recurrent engine and MobileNetV3 for feature extraction, operating auto-regressively to predict sequential fixation heatmaps based on visual stimuli and fixation history. Despite requiring only 0.61 GFLOPs, GazeLNN achieves a 0.47 ScanMatch score on the MIT Low Resolution dataset, demonstrating leading performance. It significantly outperforms recurrent baselines across various metrics, reducing computational costs by 99.40% and accelerating inference by up to six times. The model has been successfully integrated into an active camera-robot control policy, trained via Reinforcement Learning, enabling human-fixation-guided perception during autonomous navigation, and validated through real-world deployments on an aerial robot.
Key takeaway
For Robotics Engineers developing autonomous navigation systems, GazeLNN offers a critical advancement in efficient human attention modeling. You should consider integrating this lightweight architecture to enable fixation-guided active perception, significantly reducing computational overhead by 99.40% and accelerating inference by six times compared to existing recurrent baselines. This allows for more responsive and human-like visual processing in real-world robot deployments, enhancing navigation capabilities without prohibitive resource demands.
Key insights
GazeLNN offers a lightweight, high-performance model for human attention prediction, enabling efficient fixation-guided active perception in robots.
Principles
- Human visual attention guides efficient scene processing.
- Lightweight recurrent engines can achieve top performance.
- Integrating attention models enhances robot autonomy.
Method
GazeLNN employs Liquid Neural Networks and MobileNetV3 to auto-regressively predict sequential fixation heatmaps, conditioned on current visual input and fixation history.
In practice
- Integrate GazeLNN for fixation-guided navigation.
- Apply to aerial robots for active perception.
- Reduce computational load in attention prediction.
Topics
- Human Attention Prediction
- Liquid Neural Networks
- MobileNetV3
- Autonomous Navigation
- Active Perception
- Reinforcement Learning
- Aerial Robotics
Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Robotics Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.