Article
Introduction
A study is underpowered when its sample size is insufficient to reliably detect the effect it is testing. If the true effect is small and the sample is small, the study will usually return a non-significant result even if the effect is real, and occasionally return a significant result that is a false positive. The consequences are well-known to statisticians and largely unknown to the researchers who design studies, a gap in knowledge transfer that this paper attributes directly to the curriculum.
We address a question that is both simple and, it turns out, dispiriting: how much instruction on statistical power do graduate students in quantitative disciplines actually receive? To answer this, we obtained syllabi and lecture materials from 14 graduate methods courses across 11 universities and coded instructional time by topic. We supplemented this with interviews with 43 recent graduates who had completed these courses.
When asked to define statistical power without looking it up, 37 of 43 graduates (86%) could not provide a definition accurate to within one standard deviation of the correct answer, a metric we acknowledge is unusual but feel captures the spirit of the situation.
The Curriculum Audit
Instructional time devoted to statistical power across 14 courses ranged from 8 minutes to 54 minutes, with a mean of 23 minutes and a mode of 0 minutes (three courses mentioned power only in passing references to sample size). For context, the same 14 courses devoted a mean of 241 minutes to p-values, 186 minutes to t-tests, and 94 minutes to ANOVA. The topic receiving the most instructional time across all courses was “data cleaning,” which averaged 312 minutes and which, in eleven courses, consisted substantially of instruction on removing outliers until the results improved.
Power was most commonly introduced as a feature of GPower software, which all courses mentioned and none explained. In six courses, the only instruction on power analysis was “use GPower, it’s free, Google it.” In one course, the instructor referred to the concept as “beta stuff” and moved on.
The Post-Hoc Power Problem
Post-hoc (or “observed”) power analysis — computing power after the study using the observed effect size — is performed in 24 of 400 (6%) papers in our publication audit. It is also, as Hoenig and Heisey documented in 2001, entirely uninformative and sometimes actively misleading. A post-hoc power calculation for a non-significant result will always find low power, because the observed effect is small; this tells the researcher nothing they did not already know. Of the 24 papers reporting post-hoc power, 23 use it to explain why their non-significant result might still reflect a real effect. This is backwards.
The persistence of post-hoc power analysis in published literature is itself evidence of the curriculum gap: researchers are performing an analysis they were not taught to interpret, in a context where it cannot provide useful information, to justify a conclusion it does not support. The one remaining paper uses post-hoc power to argue that their significant result is reliable, which is a different error but equally uninformative.
References
- Cohen, J. (1962). “The Statistical Power of Abnormal-Social Psychological Research: A Review.” Journal of Abnormal and Social Psychology, 65(3), pp. 145-153. (Real paper. Nobody read it at the time; situation ongoing.)
- Hoenig, J., & Heisey, D. (2001). “The Abuse of Power.” The American Statistician, 55(1), pp. 19-24. (Also real. Also ignored.)
- G-Power, G. (2024). “We Are Software, Not a Curriculum.” User Manual, pp. 1-87.
- Hypothesis, N. (2026). “This Study Was Underpowered (N=1).” I3E Trashactions on Things Nobody Told The Professor, 1(1), pp. 13-13.