The stupidity, the criminal vandalism, the wanton destruction of information involved in dichotomisation
Summary
A study analyzed 21,435 unique randomized controlled trials (RCTs) from the Cochrane Database of Systematic Reviews (CDSR), finding that 7,224 (34%) had continuous outcomes and 14,211 (66%) had binary outcomes. Trials with binary outcomes exhibited larger average sample sizes, yet also showed larger standard errors and fewer statistically significant results. The researchers concluded that while sample sizes are increased to offset the lower information content of binary outcomes, this compensation is often insufficient. Many binary outcomes result from the avoidable dichotomization of continuous data, a practice termed "responder analysis," which leads to a significant and unnecessary loss of information. This method is deemed wasteful, costly, and unethical, as it burdens more participants than required.
Key takeaway
For AI Scientists designing or evaluating clinical trials, you should critically assess any proposed dichotomization of continuous outcomes. This practice significantly reduces statistical power and inflates required sample sizes, leading to less efficient and potentially unethical research. Prioritize direct analysis of continuous data to preserve information and ensure robust conclusions, potentially using tools like the described Shiny app to demonstrate the impact of dichotomization.
Key insights
Dichotomizing continuous outcomes in clinical trials leads to significant information loss and inflated sample sizes.
Principles
- Binary outcomes reduce information content.
- Dichotomization inflates standard errors.
- Increased sample size does not fully compensate.
Method
The study used a simple method to approximate information loss across clinical trials in the CDSR. A Shiny app calculates information loss from "responder analysis" and determines required sample sizes for continuous vs. dichotomized outcomes.
In practice
- Avoid dichotomizing continuous outcomes.
- Use the Shiny app to quantify information loss.
- Calculate sample sizes for continuous data.
Topics
- Dichotomization
- Clinical Trial Design
- Statistical Information Loss
- Sample Size Determination
- Biostatistics
Best for: AI Scientist, Data Scientist, Research Scientist, Domain Expert
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Statistical Modeling, Causal Inference, and Social Science.