There are at least ten distinct technical families of teacher→student transfer, not one monolithic “distillation.”

· Source: Pascal’s Substack · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Advanced, long

Summary

Teacher–student transfer in AI encompasses at least ten distinct technical families, moving beyond a monolithic "distillation" concept to enable cheaper "student" models to learn from expensive "frontier" "teachers." These methods, ranging from simple RAG and prompting to complex logit-level distillation and speculative decoding, are primarily driven by economics; inference costs for GPT-3.5-level systems plummeted over 280-fold from ~\$20 to ~\$0.07 per million tokens between November 2022 and October 2024. This capability transfer, while effective on narrow tasks, is partial, reliably copying style but unevenly deep reasoning. The rapid adoption of these techniques has compressed the capability gap between leading closed-weight and best open-weight models on the Chatbot Arena Leaderboard from 8.04% in January 2024 to 1.70% in February 2025. However, significant risks include the transfer of hallucinations and bias, "subliminal learning," and "model collapse" from recursive synthetic data training, alongside contested legality of API-based distillation.

Key takeaway

For AI Scientists and ML Engineers evaluating model deployment strategies, you should prioritize cost-effective teacher-student transfer methods like RAG and prompt engineering before complex fine-tuning. Validate all distilled models with human evaluation and red-team testing, as benchmark gains alone are insufficient due to risks like "subliminal learning" and "model collapse." Carefully review API terms-of-service for training clauses to avoid legal disputes, especially if your task's open-vs-frontier performance gap is minimal.

Key insights

Teacher-student transfer involves diverse methods to imbue smaller models with frontier capabilities, driven by economics but fraught with risks.

Principles

In practice

Topics

Code references

Best for: AI Architect, MLOps Engineer, AI Engineer, Machine Learning Engineer, AI Scientist, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Pascal’s Substack.