[D] Data scientists - what actually eats up most of your time?
Summary
A developer is conducting research into the practical day-to-day workflows of data scientists to identify genuine pain points for a new tool. The inquiry aims to understand how data scientists actually spend their time, contrasting it with common perceptions or "glamorized" versions of the role. Key areas of interest include the most time-consuming tasks (e.g., data wrangling, feature engineering, model training, stakeholder communication), the most frustrating or tedious workflow components, and the typical technology stack used (e.g., Python/R, cloud platforms, MLflow, notebooks vs. IDEs). The research also seeks to quantify the proportion of time spent on "actual" machine learning work versus data engineering or cleaning, and to pinpoint specific tasks that, if made 10x faster, would significantly improve efficiency.
Key takeaway
For AI Product Managers or developers building tools for data professionals, your product strategy should directly address the time-consuming and frustrating aspects of data wrangling, cleaning, and engineering. Focus on solutions that accelerate these "un-glamorous" but critical tasks, as this will provide the most significant value and free up data scientists for more core ML work, rather than just optimizing model training or deployment.
Key insights
Understanding real-world data science workflows is crucial for building effective support tools.
Principles
- Focus on genuine pain points.
- Distinguish practice from perception.
Method
Gather direct feedback from data scientists on time allocation, frustrations, tech stack, and desired workflow accelerations to inform tool development.
In practice
- Prioritize data wrangling automation.
- Streamline stakeholder communication.
- Optimize experiment tracking.
Topics
- Data Science Workflows
- Data Wrangling
- Feature Engineering
- MLOps
- Stakeholder Communication
Best for: Data Scientist, Software Engineer, AI Product Manager
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.