COMAP: Co-Evolving World Models and Agent Policies for LLM Agents

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

The COMAP framework introduces a novel approach for equipping language agents with adaptive textual world models and evolving agent policies through closed-loop interaction. Addressing limitations of fixed world models and reliance on external rewards, COMAP enables agents to anticipate environment dynamics and evaluate actions more effectively. At each decision step, the world model predicts future state feedback for candidate actions, while the agent refines its actions by reflecting on the feedback's reliability. On-policy trajectories then update the world model via self-distillation, ensuring it adapts to the agent's evolving interaction distribution. This co-evolutionary loop significantly outperforms competitive baselines, demonstrating a +16.75% relative improvement with Qwen3-4B across embodied task planning, Web navigation, and tool-use benchmarks, enhancing prediction accuracy and long-horizon decision-making.

Key takeaway

For Machine Learning Engineers developing LLM agents for interactive environments, if you are struggling with fixed world models limiting agent adaptability, consider the COMAP framework. Its co-evolutionary approach, where world models and agent policies adapt through closed-loop interaction and self-distillation, significantly improves long-horizon decision-making. You should explore integrating this method to achieve performance gains, such as the +16.75% relative improvement seen with Qwen3-4B, enhancing your agent's predictive accuracy and overall effectiveness.

Key insights

COMAP co-evolves world models and agent policies through closed-loop interaction for adaptive LLM agents.

Principles

Method

At each decision step, the world model predicts future state feedback; the agent refines actions via reflection; on-policy trajectories then update the world model via self-distillation.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.