Ego2World: Compiling Egocentric Cooking Videos into Executable Worlds for Belief-State Planning
Summary
Ego2World is a novel executable benchmark that transforms egocentric cooking videos from the HD-EPIC dataset into interactive symbolic worlds for evaluating embodied agents. Unlike passive video datasets or synthetic simulators, Ego2World addresses the challenge of planning under partial observation by separating a hidden world graph (${G_{\mathrm{w}}}_{t}$) maintained by the simulator from an agent's belief graph (${G_{\mathrm{b}}}_{t}$). The benchmark compiles 101 videos, 9,130 action groups, and 426 goal-task instances into graph-transition rules, allowing agents to act, receive feedback, and replan without full world state knowledge. Experiments reveal that action-overlap scores often overestimate physical-state success, and persistent belief memory significantly improves task completion while reducing visual exploration. The compilation process itself is robust, with direct LLM graph synthesis showing a 48% hallucination rate, underscoring the need for annotation-grounded construction.
Key takeaway
For research scientists developing embodied AI agents, Ego2World highlights the critical need to move beyond action prediction towards robust belief maintenance and state-change reasoning. You should prioritize designing agents that can effectively update and utilize an internal belief graph under partial observation, as this significantly improves task completion and reduces costly visual exploration, even if it means sacrificing some local action plausibility.
Key insights
Ego2World enables embodied agents to plan and adapt in partially observed, dynamic environments by compiling real egocentric videos into executable symbolic worlds.
Principles
- Separate hidden world state from agent belief state.
- Action plausibility does not equate to state success.
- Memory selection is crucial for long-horizon planning.
Method
Ego2World compiles HD-EPIC video annotations into graph-transition rules, creating an executable symbolic simulator (VCSS). Agents plan over a partial belief graph, receiving local observations and execution feedback, while task success is judged against the hidden world graph.
In practice
- Implement belief maintenance architectures for embodied agents.
- Co-optimize for both local executability and global task completion.
- Design memory systems with uncertainty-aware retrieval and forgetting.
Topics
- Ego2World Benchmark
- Belief-State Planning
- Egocentric Video Analysis
- Graph-Transition Rules
- Partial Observation
Best for: Research Scientist, AI Scientist, AI Engineer, Robotics Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.