I got frustrated teaching ML to scientists, so I started building domain-specific workshops – would love your thoughts
Summary
An AI workshop organizer for biotech and nanotechnology researchers identified a significant gap between standard machine learning education and the practical needs of scientific research. While scientists grasp core ML concepts like gradient descent and cross-validation, they struggle with real-world challenges such as predicting nanoparticle formulations with $800 per experiment costs, handling datasets with only 47 data points from mass spectrometers, and quantifying prediction certainty for reviewers. The core issue is that scientific research involves expensive, time-consuming data collection where uncertainty is critical, contrasting with standard ML's assumption of abundant, cheap data focused on accuracy. The organizer currently runs 2-3 day intensive workshops covering standard ML techniques (CNNs, ensemble methods, PyTorch) framed around specific research scenarios like drug screening with 50 compounds or materials property prediction with limited synthesis data, but is questioning if this approach is sufficient.
Key takeaway
For AI Scientists designing educational programs for domain experts, recognize that standard ML curricula often overlook the realities of scientific data scarcity and the critical need for uncertainty quantification. Your workshops should prioritize specialized techniques like Bayesian methods, physics-informed neural networks, and active learning, or at least deeply integrate small-data strategies beyond basic transfer learning, to truly equip researchers for their specific challenges.
Key insights
Standard ML education often fails to address the unique data constraints and uncertainty requirements of scientific research.
Principles
- Scientific data is expensive and scarce.
- Uncertainty quantification is critical in science.
Method
Workshops frame standard ML techniques (CNNs, ensemble methods, transfer learning) within specific scientific research scenarios like drug screening with limited compounds or materials property prediction.
In practice
- Apply ML to drug screening with 50 compounds.
- Analyze microscopy images with domain-specific noise.
Topics
- Domain-Specific ML
- Small Data Learning
- Uncertainty Quantification
- Physics-informed Neural Networks
- Active Learning
Best for: AI Scientist, Research Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning ML & Generative AI News.