The stupidity, the criminal vandalism, the wanton destruction of information involved in dichotomisation

2026-02-14 · Source: Statistical Modeling, Causal Inference, and Social Science · Field: Science & Research — Mathematics & Computational Sciences, Health & Medical Research, Research Methodology & Innovation · Depth: Advanced, quick

Summary

A study analyzed 21,435 unique randomized controlled trials (RCTs) from the Cochrane Database of Systematic Reviews (CDSR), finding that 7,224 (34%) had continuous outcomes and 14,211 (66%) had binary outcomes. Trials with binary outcomes exhibited larger average sample sizes, yet also showed larger standard errors and fewer statistically significant results. The researchers concluded that while sample sizes are increased to offset the lower information content of binary outcomes, this compensation is often insufficient. Many binary outcomes result from the avoidable dichotomization of continuous data, a practice termed "responder analysis," which leads to a significant and unnecessary loss of information. This method is deemed wasteful, costly, and unethical, as it burdens more participants than required.

Key takeaway

For AI Scientists designing or evaluating clinical trials, you should critically assess any proposed dichotomization of continuous outcomes. This practice significantly reduces statistical power and inflates required sample sizes, leading to less efficient and potentially unethical research. Prioritize direct analysis of continuous data to preserve information and ensure robust conclusions, potentially using tools like the described Shiny app to demonstrate the impact of dichotomization.

Key insights

Dichotomizing continuous outcomes in clinical trials leads to significant information loss and inflated sample sizes.

Principles

Binary outcomes reduce information content.
Dichotomization inflates standard errors.
Increased sample size does not fully compensate.

Method

The study used a simple method to approximate information loss across clinical trials in the CDSR. A Shiny app calculates information loss from "responder analysis" and determines required sample sizes for continuous vs. dichotomized outcomes.

In practice

Avoid dichotomizing continuous outcomes.
Use the Shiny app to quantify information loss.
Calculate sample sizes for continuous data.

Topics

Dichotomization
Clinical Trial Design
Statistical Information Loss
Sample Size Determination
Biostatistics

Best for: AI Scientist, Data Scientist, Research Scientist, Domain Expert

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Statistical Modeling, Causal Inference, and Social Science.