False Discovery Rates, FDR, clearly explained
Summary
False Discovery Rates (FDR), particularly the Benjamini-Hochberg method, serve as a critical tool for managing false positives in high-throughput experiments such as RNA sequencing. The article explains that when testing numerous hypotheses, like 10,000 genes, a standard p-value cutoff of 0.05 can yield a substantial 500 false positives. It differentiates between p-value distributions: uniformly distributed for samples from the same source and skewed towards zero for samples from different sources. The Benjamini-Hochberg method mathematically adjusts p-values, typically increasing them, to control the false discovery rate. For instance, a p-value of 0.04 might become 0.06 after adjustment, losing its statistical significance. This adjustment ensures that if an FDR cutoff of 0.05 is applied, fewer than 5% of the reported significant results will be false positives, effectively "weeding out bad data that looks good." The process involves ranking p-values and applying a formula to derive adjusted values.
Key takeaway
For research scientists analyzing high-throughput data, such as RNA sequencing, you must apply False Discovery Rate (FDR) correction to your p-values. Ignoring FDR can lead to hundreds of false positives, misinterpreting non-significant findings as true discoveries. Implement the Benjamini-Hochberg method to adjust p-values, ensuring that your reported significant results maintain a controlled rate of false discoveries, typically below 5%. This prevents misallocation of resources based on spurious correlations.
Key insights
False Discovery Rates (FDR) control false positives in multiple hypothesis testing by adjusting p-values.
Principles
- Multiple hypothesis testing inflates false positives.
- P-values from null hypotheses are uniformly distributed.
- Benjamini-Hochberg adjusts p-values to limit false discoveries.
Method
The Benjamini-Hochberg method ranks p-values from smallest to largest, then iteratively calculates adjusted p-values by comparing the previous adjusted value with (current p-value * total tests / rank).
In practice
- Apply Benjamini-Hochberg to high-throughput data.
- Use FDR-adjusted p-values for significance cutoffs.
- Identify true positives in gene expression studies.
Topics
- False Discovery Rate
- Benjamini-Hochberg method
- P-value adjustment
- Multiple hypothesis testing
- RNA sequencing
- Statistical significance
Best for: Data Scientist, Research Scientist, Domain Expert
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by StatQuest with Josh Starmer.