Supervised learning is great — it's data collection that's broken

2017-04-02 · Source: Explosion · Developer tools and consulting for AI, Machine Learning and NLP - Explosion.ai · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Intermediate, quick

Summary

The article contends that current dissatisfaction with supervised learning is misplaced, asserting that labelled examples remain an effective means of specifying computational objectives. It highlights that the fundamental problem lies not with supervised learning itself, but with the inefficient and often tedious processes involved in collecting and reusing human knowledge for data labeling. The author argues that while Artificial General Intelligence is still distant, supervised learning provides a robust framework. Instead of waiting for unsupervised learning to provide a universal solution, the author advocates for a focused effort on improving data collection methodologies and enhancing the reuse of existing human-annotated data to address the core challenges in machine learning development.

Key takeaway

For Machine Learning Engineers focused on model performance, recognize that improving data collection and human knowledge reuse offers greater immediate returns than solely pursuing novel unsupervised methods. Prioritize investing in robust data labeling pipelines and strategies for leveraging existing annotated datasets to directly enhance supervised model efficacy and reduce development friction.

Key insights

Supervised learning is effective; data collection and human knowledge reuse are the real problems.

Principles

Labelled examples effectively specify computation.
Dissatisfaction with supervised learning is misplaced.
Fix data collection, don't just await AGI.

In practice

Improve data collection processes.
Enhance reuse of human knowledge.
Focus on data quality over new paradigms.

Topics

Supervised Learning
Data Collection
Data Labeling
Human Knowledge Reuse
Machine Learning Development

Best for: AI Scientist, Machine Learning Engineer, Data Scientist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Explosion · Developer tools and consulting for AI, Machine Learning and NLP - Explosion.ai.