NavTrust: Benchmarking Trustworthiness for Embodied Navigation

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Advanced, quick

Summary

NavTrust is a new unified benchmark designed to evaluate the trustworthiness of embodied navigation agents by systematically introducing realistic corruptions to input modalities. It addresses a critical gap in existing evaluations, which primarily assess model performance under ideal conditions. NavTrust corrupts RGB, depth, and natural language instructions, providing the first framework to expose agents to diverse RGB-Depth corruptions and instruction variations. Extensive evaluation of seven state-of-the-art approaches, including Uni-NaVid and ETPNav, revealed significant performance degradation under these realistic corruptions, underscoring major robustness deficiencies. The benchmark also explores four distinct mitigation strategies to improve robustness against these corruptions, with observed improvements when deployed on a real mobile robot.

Key takeaway

For AI Scientists developing embodied navigation systems, understanding the impact of real-world input corruptions is crucial. Your models, even state-of-the-art ones like Uni-NaVid or ETPNav, will likely experience significant performance drops. Prioritize integrating robustness testing against diverse RGB, depth, and instruction corruptions into your development cycle to build more trustworthy and reliable agents for real-world deployment.

Key insights

NavTrust benchmarks embodied navigation trustworthiness by systematically corrupting inputs to reveal robustness gaps.

Principles

Method

NavTrust systematically corrupts RGB, depth, and instruction inputs for Vision-Language Navigation (VLN) and Object-Goal Navigation (OGN) tasks, then evaluates agent performance and mitigation strategies.

In practice

Topics

Best for: Computer Vision Engineer, AI Scientist, Research Scientist, AI Researcher, AI Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.