Training Composer for longer horizons
Summary
Composer, a specialized model for agentic coding, has been trained for long-horizon tasks using a novel reinforcement learning process called "self-summarization," which integrates compaction directly into the training loop. This method allows Composer to learn to identify and preserve critical information, enabling it to work on challenging coding tasks requiring hundreds of actions and exceeding typical model context windows. Unlike traditional compaction techniques that risk information loss, "self-summarization" significantly reduces compaction error by 50% on CursorBench, even compared to highly tuned prompt-based baselines, while being five times more token-efficient and reusing the KV cache. Composer achieves this by generating its own condensed context (around 1,000 tokens) from a minimal prompt, demonstrating its ability to solve complex problems like "make-doom-for-mips" by summarizing over 100,000 tokens. This advancement represents a crucial step towards training more capable agentic systems for even longer and more complex processes, including multi-agent coordination.
Key takeaway
Composer, an agentic coding model, significantly improves performance on long-horizon tasks by learning "self-summarization" through reinforcement learning. This "compaction-in-the-loop" training reduces compaction error by 50% and uses one-fifth the tokens of prompt-based baselines, enabling solutions to complex problems like "make-doom-for-mips." This breakthrough allows practical deployment of agents requiring hundreds of actions and extensive reasoning by efficiently preserving critical context.
Topics
- Reinforcement Learning
- Self-Summarization
- AI Agents
- Context Management
- Code Generation
Best for: AI Scientist, Research Scientist, AI Researcher, Machine Learning Engineer, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Cursor Blog.