5 Sample Size Pitfalls That Trip Up Even Experienced Researchers
Power analysis is deceptively simple in theory. In practice, subtle mistakes in effect size estimation, dropout assumptions, and multiplicity adjustments can invalidate your entire study before enrollment begins.
Sample size calculation sits at the foundation of every clinical study. Get it wrong, and you either waste resources on an overpowered trial or -- worse -- publish an underpowered study that misses a real effect. Despite decades of methodological guidance, the same mistakes keep appearing in grant applications and protocols.
Here are five pitfalls we see repeatedly, and how to avoid them.
1. Using a "Convenient" Effect Size
The most common error is choosing an effect size because it produces a feasible sample size, rather than because it reflects a clinically meaningful difference. A researcher aiming for n=50 per group might back-calculate an effect size of d=0.8, then justify it post hoc.
The fix: Start from the Minimal Clinically Important Difference (MCID) for your outcome measure. If no MCID exists, use pilot data or systematic reviews of similar interventions. If the required sample size is infeasible, that is valuable information -- it means you need a different design, not a different effect size.
2. Ignoring Dropout and Non-Compliance
A power calculation that assumes 100% retention is fiction. Longitudinal studies routinely lose 15-30% of participants, and intention-to-treat analysis with imputed data does not recover the lost statistical power.
The fix: Inflate your sample size by the expected attrition rate. If you expect 20% dropout, divide your calculated n by 0.80. Better yet, model dropout as informative rather than random, and consider how differential dropout between arms could bias your results.
3. Forgetting Multiplicity Adjustments
Testing three co-primary endpoints without adjusting alpha inflates your family-wise error rate to roughly 14% instead of the intended 5%. Reviewers and regulatory bodies will catch this, but often only after the study is complete.
The fix: Decide your multiplicity strategy before calculating sample size. Bonferroni is conservative but straightforward. Hierarchical testing preserves alpha without inflating sample size if you can rank your endpoints by clinical importance. The choice of adjustment method directly affects the required n -- build it into your power calculation from the start.
4. Misspecifying the Statistical Test
Using a t-test power calculation when your analysis plan calls for a mixed-effects model, or calculating power for a log-rank test when you plan to use Cox regression with covariates, leads to sample sizes that are either too small or wastefully large.
The fix: Your power calculation should mirror your planned primary analysis as closely as possible. If you plan to adjust for baseline covariates in an ANCOVA, use an ANCOVA-based power calculation -- the variance reduction from covariates typically lowers the required n by 20-30%. Simulation-based power analysis is invaluable when closed-form solutions do not match your analysis plan.
5. Treating Sample Size as a One-Time Calculation
Assumptions change. Interim data may reveal that your variance estimate was wrong, your control group event rate was off, or your dropout rate is higher than expected. A fixed sample size calculated once at the protocol stage cannot adapt to these realities.
The fix: Consider adaptive designs that allow sample size re-estimation at a pre-specified interim analysis. Group sequential designs, in particular, can let you stop early for efficacy or futility while maintaining the overall type I error rate. Even without a formal adaptive design, planning a blinded sample size re-estimation based on pooled variance can save a study from being underpowered.
The Broader Pattern
These pitfalls share a common root: treating sample size calculation as a bureaucratic checkbox rather than a modeling exercise. Power analysis is a prediction about how your study will behave under specific assumptions. The quality of those assumptions determines whether your study succeeds or fails.
Tools like G*Power and PASS handle the arithmetic, but they cannot tell you whether your inputs are reasonable. That is where domain expertise -- and increasingly, AI-assisted methodology review -- becomes essential. An AI system that understands both the statistical framework and the clinical context can flag unrealistic assumptions before they become expensive mistakes.
At ProtoCol, our biostatistics phase walks researchers through each assumption, challenges questionable inputs, and generates the code to verify calculations independently. The goal is not to replace statistical thinking, but to make sure it happens rigorously every time.