Lecture 6: Post-Hoc Tests

January 29, 2026

Pos Hoc Tests and multiple testing

The goal of ANOVA and many other statistical frameworks is to test whether variation among groups is larger than expected by chance, not to identify which groups differ. In ANOVA, the null hypothesis is that all group means (within or across factors) are equal. When this hypothesis is rejected, we learn that at least one group differs, but ANOVA alone does not tell us where those differences lie.

Post hoc tests are designed to address this question. They allow us to identify which specific groups differ significantly—that is, which differences are larger than would be expected from sampling variation alone if all groups came from the same statistical population. The term post hoc (Latin for “after this”) reflects the fact that these tests are conducted only after an overall effect has been detected.

However, post hoc tests introduce important analytical challenges because they involve conducting multiple statistical tests. As a result, false positives (Type I errors—rejecting a null hypothesis that is actually true) are expected to occur by chance alone. This issue has become increasingly important with the development of modern technologies in biology and related fields, where analyses may involve thousands or even millions of simultaneous statistical tests and associated P-values.

Consequently, we need robust statistical frameworks that limit the number of false positives while also avoiding an excessive increase in false negatives (Type II errors—failing to reject a false null hypothesis). The development of methods for multiple testing correction is therefore both long-standing and highly active in statistics, reflecting the fundamental trade-off between discovery and error control.

In this lecture we will develop an understanding of why is important to adjust for inflated type I error, and cover routine and robust procedures for multiple testing.

Make sure to consult the pedagogical guide on Moodle, which I wrote specifically to help you understand the key conceptual issues behind multiple testing.

The Multiple Comparisons Problem, the Sprightly Pedagogue.

One of the most robust and widely used procedures involving adjusting for inflated type I errors due to multiple testing is FDR, the false discovery rate. We will cover FDR in details in this lecture, but this video provides a different style of presentation for FDR.

False Discovery Rates, FDR, clearly explained; STATQUEST by the University of North Carolina.

Lecture

Download lecture: 3 slides per page and outline

Download lecture: 1 slide per page