NavTrust: Benchmarking Trustworthiness for Embodied Navigation
Summary
NavTrust is a new unified benchmark designed to evaluate the trustworthiness of embodied navigation agents by systematically introducing realistic corruptions to input modalities. It addresses a critical gap in existing evaluations, which primarily assess model performance under ideal conditions. NavTrust corrupts RGB, depth, and natural language instructions, providing the first framework to expose agents to diverse RGB-Depth corruptions and instruction variations. Extensive evaluation of seven state-of-the-art approaches, including Uni-NaVid and ETPNav, revealed significant performance degradation under these realistic corruptions, underscoring major robustness deficiencies. The benchmark also explores four distinct mitigation strategies to improve robustness against these corruptions, with observed improvements when deployed on a real mobile robot.
Key takeaway
For AI Scientists developing embodied navigation systems, understanding the impact of real-world input corruptions is crucial. Your models, even state-of-the-art ones like Uni-NaVid or ETPNav, will likely experience significant performance drops. Prioritize integrating robustness testing against diverse RGB, depth, and instruction corruptions into your development cycle to build more trustworthy and reliable agents for real-world deployment.
Key insights
NavTrust benchmarks embodied navigation trustworthiness by systematically corrupting inputs to reveal robustness gaps.
Principles
- Real-world corruptions degrade navigation performance.
- Unified benchmarks reveal critical robustness gaps.
Method
NavTrust systematically corrupts RGB, depth, and instruction inputs for Vision-Language Navigation (VLN) and Object-Goal Navigation (OGN) tasks, then evaluates agent performance and mitigation strategies.
In practice
- Test navigation agents with RGB-Depth corruptions.
- Evaluate instruction variations for VLN tasks.
Topics
- Embodied Navigation
- Vision-Language Navigation
- Object-Goal Navigation
- Robustness Benchmarking
- Input Corruptions
Best for: Computer Vision Engineer, AI Scientist, Research Scientist, AI Researcher, AI Engineer, Robotics Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.