Supervised learning is great — it's data collection that's broken
Summary
The article contends that current dissatisfaction with supervised learning is misplaced, asserting that labelled examples remain an effective means of specifying computational objectives. It highlights that the fundamental problem lies not with supervised learning itself, but with the inefficient and often tedious processes involved in collecting and reusing human knowledge for data labeling. The author argues that while Artificial General Intelligence is still distant, supervised learning provides a robust framework. Instead of waiting for unsupervised learning to provide a universal solution, the author advocates for a focused effort on improving data collection methodologies and enhancing the reuse of existing human-annotated data to address the core challenges in machine learning development.
Key takeaway
For Machine Learning Engineers focused on model performance, recognize that improving data collection and human knowledge reuse offers greater immediate returns than solely pursuing novel unsupervised methods. Prioritize investing in robust data labeling pipelines and strategies for leveraging existing annotated datasets to directly enhance supervised model efficacy and reduce development friction.
Key insights
Supervised learning is effective; data collection and human knowledge reuse are the real problems.
Principles
- Labelled examples effectively specify computation.
- Dissatisfaction with supervised learning is misplaced.
- Fix data collection, don't just await AGI.
In practice
- Improve data collection processes.
- Enhance reuse of human knowledge.
- Focus on data quality over new paradigms.
Topics
- Supervised Learning
- Data Collection
- Data Labeling
- Human Knowledge Reuse
- Machine Learning Development
Best for: AI Scientist, Machine Learning Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Explosion · Developer tools and consulting for AI, Machine Learning and NLP - Explosion.ai.