Benchmarking Counterfactual Prediction in Epidemic Time Series with Time-Varying Interventions

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Science & Research — Health & Medical Research, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

EpiCF-Bench is a new large-scale benchmark designed for counterfactual prediction in epidemic time series, specifically under dynamic interventions. It addresses the gap in realistic benchmarks by utilizing a calibrated agent-based model (ABM) that integrates real-world demographic, mobility, epidemiological, and policy data. The benchmark generates factual and counterfactual COVID-19 trajectories across 158 U.S. counties, covering a 168-day period from October 26, 2020, to April 11, 2021, for populations between 20,000 and 200,000. It supports both static and time-varying treatments, as well as single-policy and multi-policy intervention scenarios. Evaluation using EpiCF-Bench revealed substantial performance differences among various causal inference methods, highlighting the complexities of realistic time-series causal reasoning. The underlying ABM demonstrated high fidelity to ground truth, achieving a mean NRMSE of 0.1887.

Key takeaway

For machine learning engineers developing causal inference models for dynamic systems, you should integrate realistic benchmarks like EpiCF-Bench into your evaluation pipeline. This benchmark provides ground-truth counterfactuals for epidemic time series, enabling rigorous assessment of model performance under complex, time-varying interventions. Use it to validate whether your models effectively balance predictive accuracy with true causal inference capabilities, especially in multi-policy scenarios where interactions are critical.

Key insights

The lack of realistic benchmarks with ground-truth counterfactuals constrains time-series causal inference progress.

Principles

Realistic benchmarks are crucial for causal inference.
ABMs can bridge realism and evaluability gaps.
Policy interventions can have delayed or negative effects.

Method

EpiCF-Bench uses a differentiable agent-based model calibrated with real-world data to simulate factual and counterfactual epidemic trajectories by modifying policy variables through network freezing mechanisms.

In practice

Use EpiCF-Bench to evaluate time-series causal models.
Explore single- and multi-policy intervention effects.
Analyze "flattening the curve" trade-offs in simulations.

Topics

Counterfactual Prediction
Epidemic Time Series
Causal Inference Benchmarks
Agent-Based Models
Dynamic Interventions
COVID-19 Policies

Code references

complex-ai-lab/epi-cf-benchmark

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.