Why Is My Code So Slow? A Guide to Py-Spy Python Profiling
Summary
The `py-spy` tool is introduced as a powerful sampling profiler for identifying and correcting inefficiencies in Python data science code. The article demonstrates `py-spy`'s utility by optimizing a script that calculates Haversine distances for 3.5 million flight records from January to June 2025 BTS data, initially taking 169.89 seconds. `py-spy` generates an interactive Icicle Graph, which revealed that the Pandas `iterrows()` function consumed 68.36% of the runtime. By replacing the `iterrows()` loop with a vectorized NumPy-based Haversine calculation, the script's execution time was reduced to 0.56 seconds, achieving identical results. This significant performance improvement highlights `py-spy`'s effectiveness in diagnosing and resolving bottlenecks in Python programs.
Key takeaway
For Data Scientists and Software Engineers struggling with slow Python scripts, `py-spy` offers a critical diagnostic tool. Use it to generate an Icicle Graph, identify performance bottlenecks like `iterrows()`, and then refactor your code with vectorized operations (e.g., NumPy) to achieve substantial speedups. This approach can transform a multi-minute wait into sub-second execution, significantly improving development workflow and pipeline efficiency.
Key insights
`py-spy` is a sampling profiler that efficiently pinpoints Python code bottlenecks without significant overhead.
Principles
- Sampling profilers offer low overhead.
- Vectorized operations are faster than row-wise loops.
Method
`py-spy` records program state snapshots at high frequency, generating an Icicle Graph to visualize time spent in functions, with wider bars indicating more frequent execution.
In practice
- Use `py-spy record -o profile.svg -r 100 -- python main.py`.
- Analyze Icicle Graphs: wider bars mean more time.
- Replace `iterrows()` with vectorized NumPy operations.
Topics
- Python Profiling
- py-spy
- Performance Optimization
- Data Science Workflows
- Vectorization
Code references
Best for: Data Scientist, Data Engineer, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Towards Data Science.