Survey Statistics: it is (still) the people
Summary
The Survey Statistics blog series, celebrating Andrew Gelman's 60-ish Birthday workshop, highlights challenges in estimating current voter support using survey data. Yair Ghitza's talk referenced his co-authored papers with Gelman from 2013 and 2020 on Multilevel Regression and Poststratification (MRP). The discussion also covered Nate Cohn's May 18, 2026, NYT article, which proposes a "synthetic past vote" (X**) method. This technique improves recalled 2024 vote (X*) by imputing X** when X* is missing but a 2024 voting record (V2024=1) exists. It also validates X** as "nonvoter" if no voting record (V2024=0) is found. A primary challenge is estimating E(Y | V=1) when true past vote (X) is unknown. This requires estimating E(Y | X, sample, V = 1) and p(X | V=1). Cohn's approach addresses this by estimating p(X** | V = 1).
Key takeaway
For Research Scientists or Data Scientists analyzing voter support, consider integrating "synthetic past vote" (X**) methods into your survey weighting. This approach, proposed by Nate Cohn, addresses the challenge of mismeasured or missing recalled vote data by utilizing official voting records (V2024). Implementing imputation for missing X* and validating non-voters with V2024 can significantly enhance the reliability of your E(Y | V=1) estimates. Evaluate how this method could refine your current Multilevel Regression and Poststratification (MRP) models for more accurate political polling.
Key insights
Synthetic past vote (X**) improves voter support estimation by addressing missing or inaccurate recalled vote data.
Principles
- Mismeasured data (X*) complicates accurate estimation.
- True past vote (X) is often unobservable.
- Combining survey data with voter records enhances accuracy.
Method
Cohn's method creates "synthetic past vote" (X**) by imputing X** if recalled vote (X*) is missing but a voting record (V2024=1) exists, and validating X** as "nonvoter" if V2024=0.
In practice
- Impute missing recalled vote using official records.
- Validate non-voters with registration databases.
- Integrate V2024 data for robust weighting.
Topics
- Survey Statistics
- Voter Support Estimation
- Multilevel Regression and Poststratification
- Synthetic Past Vote
- Political Polling
- Data Imputation
Best for: AI Scientist, Data Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Statistical Modeling, Causal Inference, and Social Science.