How We Built Zeta2: Training an Edit Prediction Model in Production — Ben Kunkle, Zed

· Source: AI Engineer · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, medium

Summary

Zed developed Zeta2, an edit prediction model designed to suggest the next code edit around a user's cursor, requiring high speed for keystroke-level operation. Its training pipeline utilizes opt-in production data, capturing code snapshots, cursor positions, type definitions, and diagnostics. A key process is distillation, where a frontier model generates initial predictions, followed by a "repair step" using another frontier model to fix identified bad predictions. These refined predictions form the student model's expected output. The system employs "settled data," where user-completed edits are captured after a 10-second pause, though this data is noisy. To filter noise, Zed uses student models to generate multiple predictions and compares them via Levenshtein distance to the settled state, identifying ideal training examples. Offline evaluations use a held-out test set, tracking metrics like "delta car f" and "reversal ratio," with A/B testing in production to assess acceptance rates and latency.

Key takeaway

For AI Engineers building or enhancing code prediction models, prioritize a robust data pipeline that refines noisy production data. Implement distillation from larger models and an iterative repair step to improve training example quality. To manage costs, leverage your own student models for filtering "settled data" rather than expensive frontier model calls. A/B test new model versions with controlled production traffic to validate real-world performance and user acceptance before full deployment.

Key insights

Training a specialized code edit prediction model in production leverages frontier model distillation and filtered user-settled data for optimal examples.

Principles

Method

Capture production snapshots, distill frontier model predictions, repair bad outputs, and format prompts. Filter noisy "settled data" by comparing student model predictions to user-completed edits via Levenshtein distance.

In practice

Topics

Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI Engineer.