A technical report on Composer 2
Summary
A technical report details the training and evaluation of Composer 2, a coding model designed for agentic software engineering. The model undergoes a two-phase training process, beginning with continued pretraining on the Kimi K2.5 base model, emphasizing code-centric data, followed by large-scale reinforcement learning (RL) within realistic Cursor sessions. This approach significantly improves end-to-end agent performance, with better base knowledge directly correlating to enhanced RL outcomes. Composer 2 achieves a CursorBench score of 61.3, representing a 37% improvement over Composer 1.5, and scores 73.7 on SWE-bench Multilingual and 61.7 on Terminal-Bench. The model demonstrates competitive performance against frontier models at substantially lower inference costs, offering a Pareto-optimal balance of accuracy and cost for interactive developer workflows. Its development involved extensive infrastructure, including custom low-precision kernels for MoE training on Blackwell GPUs and an asynchronous RL pipeline.
Key takeaway
For research scientists developing agentic coding models, you should prioritize continued pretraining on domain-specific data and large-scale reinforcement learning in realistic environments. Your evaluation should leverage benchmarks like CursorBench that reflect complex, multi-file coding tasks to ensure models are aligned with actual developer workflows and achieve Pareto-optimal cost-accuracy tradeoffs.
Key insights
Continued pretraining and large-scale RL on realistic data significantly enhance coding model performance and efficiency.
Principles
- Reducing pretraining loss improves downstream RL performance.
- Realistic evaluation benchmarks align models with developer needs.
Method
Composer 2 training involves continued pretraining on code-rich data, then large-scale reinforcement learning in real Cursor environments, using a custom benchmark, CursorBench, for evaluation.
In practice
- Use CursorBench for real-world coding task evaluation.
- Implement low-precision kernels for MoE training on Blackwell GPUs.
Topics
- Composer 2
- Agentic Software Engineering
- Reinforcement Learning
- CursorBench
- Blackwell GPUs
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Cursor Blog.