How We Built Zeta2: Training an Edit Prediction Model in Production — Ben Kunkle, Zed
Summary
Zed developed Zeta2, an edit prediction model designed to suggest the next code edit around a user's cursor, requiring high speed for keystroke-level operation. Its training pipeline utilizes opt-in production data, capturing code snapshots, cursor positions, type definitions, and diagnostics. A key process is distillation, where a frontier model generates initial predictions, followed by a "repair step" using another frontier model to fix identified bad predictions. These refined predictions form the student model's expected output. The system employs "settled data," where user-completed edits are captured after a 10-second pause, though this data is noisy. To filter noise, Zed uses student models to generate multiple predictions and compares them via Levenshtein distance to the settled state, identifying ideal training examples. Offline evaluations use a held-out test set, tracking metrics like "delta car f" and "reversal ratio," with A/B testing in production to assess acceptance rates and latency.
Key takeaway
For AI Engineers building or enhancing code prediction models, prioritize a robust data pipeline that refines noisy production data. Implement distillation from larger models and an iterative repair step to improve training example quality. To manage costs, leverage your own student models for filtering "settled data" rather than expensive frontier model calls. A/B test new model versions with controlled production traffic to validate real-world performance and user acceptance before full deployment.
Key insights
Training a specialized code edit prediction model in production leverages frontier model distillation and filtered user-settled data for optimal examples.
Principles
- Distillation from frontier models is effective for specialized tasks.
- Production data, even noisy, can inform model training.
- Iterative repair steps improve training data quality.
Method
Capture production snapshots, distill frontier model predictions, repair bad outputs, and format prompts. Filter noisy "settled data" by comparing student model predictions to user-completed edits via Levenshtein distance.
In practice
- Use JSONL for flexible data pipeline stages.
- Filter noisy user data with cheaper student model inferences.
- A/B test new models with partial production traffic.
Topics
- Edit Prediction
- Code Generation
- Machine Learning Pipelines
- Model Distillation
- A/B Testing
- Production ML
- Data Filtering
Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI Engineer.