Using Causal Inference to Estimate the Impact of Tube Strikes on Cycling Usage in London
Summary
Transport for London (TFL) shares its Santander Cycle usage data, encompassing 9.2 million station-hours across 800 bike stations from 2015 to 2025, provided in 144 weekly CSV files. This data, aggregated to H3 cell-day level, is used to causally analyze the impact of major London Underground strikes on cycle usage. The analysis involves extensive data wrangling, including converting CSVs to Parquet, grouping by bike station and hour, and joining with H3 cell coordinates and strike data from a Freedom of Information request (FOI-2596-1819). The study defines strike exposure based on proximity to striking tube lines and uses a two-way fixed effects (TWFE) model to estimate the causal effect, controlling for confounders like weather and seasonality. The most refined model, focusing on cells near 42 central interchange stations and within a 45-day window of strikes, estimates a 3.95% increase in Santander bike usage on strike days.
Key takeaway
For data scientists or urban planners analyzing public transport patterns, this analysis demonstrates how to quantify the causal impact of disruptions using open data and robust econometric methods. You should consider panel data approaches like two-way fixed effects when assessing interventions with recurring binary treatments, as they effectively mitigate selection bias and time-invariant confounding. This framework can inform infrastructure planning and emergency response strategies during transit strikes.
Key insights
Tube strikes in London cause a measurable increase in Santander Cycle usage due to commuter substitution.
Principles
- Panel data is effective for isolating time-invariant characteristics.
- Causal mechanisms must credibly link treatment to outcome.
- SUTVA violations can attenuate causal effect estimates.
Method
A two-way fixed effects model, clustering errors at the cell level, estimates causal treatment effects by controlling for both time-invariant cell characteristics and cell-invariant day-specific variations.
In practice
- Use H3 cells for spatial aggregation in urban mobility analysis.
- Restrict analysis to relevant geographic and temporal windows.
- Cluster standard errors at the unit level in panel data regressions.
Topics
- Causal Inference
- London Tube Strikes
- Santander Cycles
- Panel Data Analysis
- Two-Way Fixed Effects Model
Code references
Best for: Data Scientist, Research Scientist, AI Student
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Towards Data Science.