A technical report on Composer 2

2026-03-27 · Source: Cursor Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

A technical report details the training and evaluation of Composer 2, a coding model designed for agentic software engineering. The model undergoes a two-phase training process, beginning with continued pretraining on the Kimi K2.5 base model, emphasizing code-centric data, followed by large-scale reinforcement learning (RL) within realistic Cursor sessions. This approach significantly improves end-to-end agent performance, with better base knowledge directly correlating to enhanced RL outcomes. Composer 2 achieves a CursorBench score of 61.3, representing a 37% improvement over Composer 1.5, and scores 73.7 on SWE-bench Multilingual and 61.7 on Terminal-Bench. The model demonstrates competitive performance against frontier models at substantially lower inference costs, offering a Pareto-optimal balance of accuracy and cost for interactive developer workflows. Its development involved extensive infrastructure, including custom low-precision kernels for MoE training on Blackwell GPUs and an asynchronous RL pipeline.

Key takeaway

For research scientists developing agentic coding models, you should prioritize continued pretraining on domain-specific data and large-scale reinforcement learning in realistic environments. Your evaluation should leverage benchmarks like CursorBench that reflect complex, multi-file coding tasks to ensure models are aligned with actual developer workflows and achieve Pareto-optimal cost-accuracy tradeoffs.

Key insights

Continued pretraining and large-scale RL on realistic data significantly enhance coding model performance and efficiency.

Principles

Reducing pretraining loss improves downstream RL performance.
Realistic evaluation benchmarks align models with developer needs.

Method

Composer 2 training involves continued pretraining on code-rich data, then large-scale reinforcement learning in real Cursor environments, using a custom benchmark, CursorBench, for evaluation.

In practice

Use CursorBench for real-world coding task evaluation.
Implement low-precision kernels for MoE training on Blackwell GPUs.

Topics

Composer 2
Agentic Software Engineering
Reinforcement Learning
CursorBench
Blackwell GPUs

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Cursor Blog.