Supervised learning is great — it's data collection that's broken

· Source: Explosion · Developer tools and consulting for AI, Machine Learning and NLP - Explosion.ai · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Intermediate, quick

Summary

The article contends that current dissatisfaction with supervised learning is misplaced, asserting that labelled examples remain an effective means of specifying computational objectives. It highlights that the fundamental problem lies not with supervised learning itself, but with the inefficient and often tedious processes involved in collecting and reusing human knowledge for data labeling. The author argues that while Artificial General Intelligence is still distant, supervised learning provides a robust framework. Instead of waiting for unsupervised learning to provide a universal solution, the author advocates for a focused effort on improving data collection methodologies and enhancing the reuse of existing human-annotated data to address the core challenges in machine learning development.

Key takeaway

For Machine Learning Engineers focused on model performance, recognize that improving data collection and human knowledge reuse offers greater immediate returns than solely pursuing novel unsupervised methods. Prioritize investing in robust data labeling pipelines and strategies for leveraging existing annotated datasets to directly enhance supervised model efficacy and reduce development friction.

Key insights

Supervised learning is effective; data collection and human knowledge reuse are the real problems.

Principles

In practice

Topics

Best for: AI Scientist, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Explosion · Developer tools and consulting for AI, Machine Learning and NLP - Explosion.ai.