Build Your Own Cursor This Weekend. Yes, the One SpaceX Just Paid $60 Billion For.

2026-06-21 · Source: Towards AI - Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, long

Summary

Cursor's in-house coding model, Composer 2, acquired by SpaceX for \$60 billion in June 2026, was built starting from Moonshot AI's open-weight Kimi K2.5 checkpoint. This demonstrates that a "frontier-level" coding assistant can be constructed primarily through integrating open-source components: a Visual Studio Code fork for the editor, a local inference server, and open-weight models. The core architecture involves a two-model setup: a small, fast Fill-in-the-Middle (FIM) capable model like Mistral's Codestral (22 billion parameters, ~95% FIM accuracy) for real-time autocomplete, and a larger model such as Qwen3-Coder-30B (30 billion parameters, fitting in ~19GB on a 24GB GPU) for chat and agentic tasks. This approach enables developers to build a functional, local AI coding assistant, processing tokens on their own hardware without external servers.

Key takeaway

For AI Engineers or ML Engineers building internal coding tools, you can now deploy a powerful, local AI assistant without proprietary models. Utilize open-source components like VS Code and Continue.dev, pairing a FIM-trained model (e.g., Codestral) for autocomplete with a larger model (e.g., Qwen3-Coder-30B) for chat. This approach keeps your code on-premises and eliminates per-token costs, offering a robust alternative to cloud-based solutions.

Key insights

A frontier coding assistant can be built by integrating open-source components and open-weight models, utilizing a two-model architecture.

Principles

Start with open-weight model checkpoints.
Separate models for autocomplete and chat.
Aggressively gather relevant code context.

Method

Build a coding assistant by integrating VS Code with a Continue.dev extension, serving a FIM-trained autocomplete model (e.g., Codestral via Ollama) and a larger chat/agent model (e.g., Qwen3-Coder-30B via vLLM) locally.

In practice

Deploy Codestral for FIM autocomplete.
Use Qwen3-Coder-30B on 24GB GPUs.
Serve models with Ollama or vLLM.

Topics

AI Coding Assistants
Open-weight LLMs
Local LLM Deployment
Fill-in-the-Middle
VS Code Extensions
Model Quantization

Code references

vllm-project/vllm

Best for: AI Engineer, Machine Learning Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI - Medium.