False Discovery Rates, FDR, clearly explained

· Source: StatQuest with Josh Starmer · Field: Science & Research — Mathematics & Computational Sciences, Life Sciences & Biology · Depth: Intermediate, long

Summary

False Discovery Rates (FDR), particularly the Benjamini-Hochberg method, serve as a critical tool for managing false positives in high-throughput experiments such as RNA sequencing. The article explains that when testing numerous hypotheses, like 10,000 genes, a standard p-value cutoff of 0.05 can yield a substantial 500 false positives. It differentiates between p-value distributions: uniformly distributed for samples from the same source and skewed towards zero for samples from different sources. The Benjamini-Hochberg method mathematically adjusts p-values, typically increasing them, to control the false discovery rate. For instance, a p-value of 0.04 might become 0.06 after adjustment, losing its statistical significance. This adjustment ensures that if an FDR cutoff of 0.05 is applied, fewer than 5% of the reported significant results will be false positives, effectively "weeding out bad data that looks good." The process involves ranking p-values and applying a formula to derive adjusted values.

Key takeaway

For research scientists analyzing high-throughput data, such as RNA sequencing, you must apply False Discovery Rate (FDR) correction to your p-values. Ignoring FDR can lead to hundreds of false positives, misinterpreting non-significant findings as true discoveries. Implement the Benjamini-Hochberg method to adjust p-values, ensuring that your reported significant results maintain a controlled rate of false discoveries, typically below 5%. This prevents misallocation of resources based on spurious correlations.

Key insights

False Discovery Rates (FDR) control false positives in multiple hypothesis testing by adjusting p-values.

Principles

Method

The Benjamini-Hochberg method ranks p-values from smallest to largest, then iteratively calculates adjusted p-values by comparing the previous adjusted value with (current p-value * total tests / rank).

In practice

Topics

Best for: Data Scientist, Research Scientist, Domain Expert

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by StatQuest with Josh Starmer.