COMAP: Co-Evolving World Models and Agent Policies for LLM Agents
Summary
The COMAP framework introduces a novel approach for equipping language agents with adaptive textual world models and evolving agent policies through closed-loop interaction. Addressing limitations of fixed world models and reliance on external rewards, COMAP enables agents to anticipate environment dynamics and evaluate actions more effectively. At each decision step, the world model predicts future state feedback for candidate actions, while the agent refines its actions by reflecting on the feedback's reliability. On-policy trajectories then update the world model via self-distillation, ensuring it adapts to the agent's evolving interaction distribution. This co-evolutionary loop significantly outperforms competitive baselines, demonstrating a +16.75% relative improvement with Qwen3-4B across embodied task planning, Web navigation, and tool-use benchmarks, enhancing prediction accuracy and long-horizon decision-making.
Key takeaway
For Machine Learning Engineers developing LLM agents for interactive environments, if you are struggling with fixed world models limiting agent adaptability, consider the COMAP framework. Its co-evolutionary approach, where world models and agent policies adapt through closed-loop interaction and self-distillation, significantly improves long-horizon decision-making. You should explore integrating this method to achieve performance gains, such as the +16.75% relative improvement seen with Qwen3-4B, enhancing your agent's predictive accuracy and overall effectiveness.
Key insights
COMAP co-evolves world models and agent policies through closed-loop interaction for adaptive LLM agents.
Principles
- World models must adapt to evolving agent interaction distributions.
- Agents can refine actions by reflecting on predicted feedback reliability.
- Self-distillation updates world models using on-policy trajectories.
Method
At each decision step, the world model predicts future state feedback; the agent refines actions via reflection; on-policy trajectories then update the world model via self-distillation.
In practice
- Implement co-evolution for embodied task planning.
- Apply to Web navigation and tool-use benchmarks.
- Integrate with LLMs like Qwen3-4B for performance.
Topics
- LLM Agents
- World Models
- Co-evolutionary Learning
- Self-Distillation
- Embodied Task Planning
- Web Navigation
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.