[D] Data scientists - what actually eats up most of your time?

· Source: Machine Learning · Field: Technology & Digital — Data Science & Analytics, Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, quick

Summary

A developer is conducting research into the practical day-to-day workflows of data scientists to identify genuine pain points for a new tool. The inquiry aims to understand how data scientists actually spend their time, contrasting it with common perceptions or "glamorized" versions of the role. Key areas of interest include the most time-consuming tasks (e.g., data wrangling, feature engineering, model training, stakeholder communication), the most frustrating or tedious workflow components, and the typical technology stack used (e.g., Python/R, cloud platforms, MLflow, notebooks vs. IDEs). The research also seeks to quantify the proportion of time spent on "actual" machine learning work versus data engineering or cleaning, and to pinpoint specific tasks that, if made 10x faster, would significantly improve efficiency.

Key takeaway

For AI Product Managers or developers building tools for data professionals, your product strategy should directly address the time-consuming and frustrating aspects of data wrangling, cleaning, and engineering. Focus on solutions that accelerate these "un-glamorous" but critical tasks, as this will provide the most significant value and free up data scientists for more core ML work, rather than just optimizing model training or deployment.

Key insights

Understanding real-world data science workflows is crucial for building effective support tools.

Principles

Method

Gather direct feedback from data scientists on time allocation, frustrations, tech stack, and desired workflow accelerations to inform tool development.

In practice

Topics

Best for: Data Scientist, Software Engineer, AI Product Manager

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.