Cost-optimal Sequential Testing via Doubly Robust Q-learning

2026-04-16 · Source: cs.AI updates on arXiv.org · Field: Science & Research — Health & Medical Research, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

COST-Q (Cost-Optimal Sequential Testing via Doubly Robust Q-learning) is a new framework designed to learn cost-optimal sequential diagnostic policies from retrospective clinical data, specifically addressing challenges like informative missingness where test availability depends on prior results. The method employs a doubly robust Q-learning framework that uses path-specific inverse probability weights and auxiliary contrast models to construct orthogonal pseudo-outcomes. This allows for unbiased policy learning even if either the acquisition model or the contrast model is misspecified. COST-Q establishes oracle inequalities for stage-wise contrast estimators, along with convergence rates, regret bounds, and misclassification rates. Simulations demonstrate improved cost-adjusted performance over baseline methods, and an application to a prostate cancer cohort study shows its ability to reduce testing costs without compromising predictive accuracy, achieving a specificity of 59.5% at 90% recall.

Key takeaway

For AI Scientists and Machine Learning Engineers developing diagnostic tools, COST-Q offers a robust method to optimize sequential testing strategies from real-world, informatively missing data. Your teams can implement this framework to balance predictive accuracy with testing costs, potentially reducing patient burden and healthcare expenditures. Consider applying COST-Q to existing retrospective datasets to identify more efficient diagnostic pathways, especially where test acquisition is adaptive and costly.

Key insights

COST-Q optimizes sequential diagnostic testing from retrospective data by integrating doubly robust estimation with Q-learning to handle informative missingness.

Principles

Doubly robust estimation ensures consistency if one nuisance model is correct.
Sequential MAR allows unbiased learning from history-dependent data.
Backward Q-learning optimizes multi-stage decision policies.

Method

COST-Q uses path-specific inverse probability weights and auxiliary contrast models to create orthogonal, doubly robust pseudo-outcomes. A K-fold cross-fitting procedure then estimates stage-wise contrast functions via backward Q-learning.

In practice

Apply COST-Q to reduce unnecessary medical tests in diagnostics.
Use for individualized sequential biomarker acquisition strategies.
Evaluate policy value using internal cross-fitted pseudo-outcomes.

Topics

Doubly Robust Estimation
Sequential Decision-Making
Informative Missingness
Cost-Optimal Prediction
Q-learning

Best for: AI Scientist, Research Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.