Why Is My Code So Slow? A Guide to Py-Spy Python Profiling

· Source: Towards Data Science · Field: Technology & Digital — Data Science & Analytics, Software Development & Engineering · Depth: Intermediate, medium

Summary

The `py-spy` tool is introduced as a powerful sampling profiler for identifying and correcting inefficiencies in Python data science code. The article demonstrates `py-spy`'s utility by optimizing a script that calculates Haversine distances for 3.5 million flight records from January to June 2025 BTS data, initially taking 169.89 seconds. `py-spy` generates an interactive Icicle Graph, which revealed that the Pandas `iterrows()` function consumed 68.36% of the runtime. By replacing the `iterrows()` loop with a vectorized NumPy-based Haversine calculation, the script's execution time was reduced to 0.56 seconds, achieving identical results. This significant performance improvement highlights `py-spy`'s effectiveness in diagnosing and resolving bottlenecks in Python programs.

Key takeaway

For Data Scientists and Software Engineers struggling with slow Python scripts, `py-spy` offers a critical diagnostic tool. Use it to generate an Icicle Graph, identify performance bottlenecks like `iterrows()`, and then refactor your code with vectorized operations (e.g., NumPy) to achieve substantial speedups. This approach can transform a multi-minute wait into sub-second execution, significantly improving development workflow and pipeline efficiency.

Key insights

`py-spy` is a sampling profiler that efficiently pinpoints Python code bottlenecks without significant overhead.

Principles

Method

`py-spy` records program state snapshots at high frequency, generating an Icicle Graph to visualize time spent in functions, with wider bars indicating more frequent execution.

In practice

Topics

Code references

Best for: Data Scientist, Data Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards Data Science.