Natural Intelligence is All You Need[tm]

· Source: Explosion · Developer tools and consulting for AI, Machine Learning and NLP - Explosion.ai · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Intermediate, extended

Summary

Natural Intelligence is All You Need[tm] challenges conventional data science thinking, advocating for doubt and creative problem-solving over established methodologies. The presentation illustrates how hypothesis-driven analysis can obscure obvious data issues, as seen with a smartwatch dataset where students without hypotheses identified "gorilla" (nonsense) data more effectively. It critiques traditional recommender systems, demonstrating how a "user-to-item" classification for used cars outperformed standard "item-to-user" methods, influenced by the Netflix Prize monoculture. For credit card fraud detection, a visualization-driven, rule-based system achieved superior precision and F1 scores compared to Keras neural networks or Random Forests, also revealing fundamental label trust issues. The talk introduces "active teaching" for dataset classification, where human experts guide model learning by prioritizing error correction and leveraging abstract-level annotations for sentence-level labels, emphasizing the need for a "blank canvas" approach.

Key takeaway

For data scientists and ML engineers developing new systems, avoid blindly applying textbook solutions or chasing single metrics. You should actively cultivate doubt in established methods and grant yourself a "blank canvas" to explore alternative problem formulations. This approach, exemplified by user-to-item recommendations or visualization-driven fraud detection, can reveal more effective, interpretable solutions and prevent you from missing critical insights or "gorillas" in your data.

Key insights

Unquestioning adherence to established methods and expertise can blind data professionals to novel, more effective solutions.

Principles

Method

For fraud detection, visualize high-dimensional data with parallel coordinates to derive rule-based systems. For text classification, redefine to sentence-level and use active teaching for annotation.

In practice

Topics

Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Explosion · Developer tools and consulting for AI, Machine Learning and NLP - Explosion.ai.