MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

The Map-then-Act Paradigm (MAP) is a novel framework for interactive Large Language Model (LLM) agents that addresses the "Delayed Environmental Perception" and "Epistemic Bottleneck" limitations of current goal-conditioned stepwise planning approaches like ReAct and Chain-of-Thought. MAP decouples environmental understanding from task execution, shifting the acquisition of environmental knowledge to a dedicated pre-execution phase. It operates in three stages: Global Exploration to acquire environment-general priors ($K_g$), Task-Specific Mapping to construct a structured cognitive map ($M_t$), and Knowledge-Augmented Execution grounded on these maps. Experiments on benchmarks including ALFWorld, TextCraft, ScienceWorld, and ARC-AGI-3 demonstrate consistent performance gains and reduced interaction steps. On ARC-AGI-3, MAP enabled frontier models to surpass near-zero baseline performance in 22 of 25 game environments. The authors also introduced MAP-2K, a dataset of map-then-act trajectories, showing that training on it outperforms expert execution traces, suggesting the primacy of environment understanding over imitation.

Key takeaway

For NLP Engineers developing LLM agents for complex, long-horizon tasks, consider adopting a "map-then-act" paradigm. By explicitly separating environmental understanding from task execution, your agents can build robust cognitive maps and global priors, leading to significantly improved success rates and reduced trial-and-error. This approach can help overcome limitations seen in reactive planning, particularly in novel or zero-knowledge environments like ARC-AGI-3, making your agents more efficient and adaptable.

Key insights

Decoupling environmental understanding from task execution significantly enhances LLM agent performance and efficiency.

Principles

Method

MAP involves Global Exploration for general priors, Task-Specific Mapping for a cognitive map, and Knowledge-Augmented Execution, guided by a Role-Purpose-Priority (RPP) prompt protocol and a dual-convergence stopping criterion for exploration.

In practice

Topics

Best for: NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.