From Research Question to Study Protocol: Why Three Phases Matter
Most research planning tools solve one piece of the puzzle. A systematic three-phase approach -- gap analysis, methodology design, biostatistical planning -- produces stronger protocols and catches problems early.
Research planning is not a single activity. It is a sequence of decisions, each building on the last, where a mistake in an early phase compounds through everything that follows. Yet most researchers approach it as a collection of disconnected tasks: search the literature here, sketch a study design there, run a power calculation somewhere else.
This fragmentation leads to protocols that do not hold together -- a research gap that does not quite match the study design, or a statistical plan that cannot answer the research question as framed. The three-phase approach addresses this by treating research planning as a pipeline where each phase's output becomes the next phase's input.
Phase 1: Research Gap Analysis
Before designing a study, you need to know precisely what is and is not known. This sounds obvious, but "I could not find any studies on X" is not the same as a systematic gap analysis.
A rigorous gap analysis involves structured literature search with multiple query strategies, classification of the type of gap (evidence, methodology, population, context, or theoretical), and a framework like PICO or PICOTS to define the question precisely.
The output of this phase is not just a list of papers. It is a clear statement of what evidence exists, where it falls short, and why a new study is justified. This framing directly shapes the next phase.
What goes wrong without it: Researchers design studies that duplicate existing evidence, address a gap that has already been filled by recent publications, or frame their question so broadly that no single study could answer it.
Phase 2: Study Methodology Design
With a well-defined gap, the methodology phase asks: what study design would generate the evidence needed to fill this gap?
This involves more than choosing between "RCT" and "cohort study." Rigorous methodology design includes selecting the appropriate study type and justifying why alternatives are inferior, constructing a causal model (often a DAG) to identify confounders and mediators, planning for bias through tools like the Cochrane Risk of Bias framework, and selecting the right reporting guideline (CONSORT, STROBE, PRISMA, etc.) from the EQUATOR network.
One particularly powerful technique gaining traction is Target Trial Emulation, where observational studies are designed to mirror the structure of an ideal randomized trial. This forces researchers to make explicit decisions about eligibility criteria, treatment strategies, and outcome definitions that might otherwise remain vague.
What goes wrong without it: Studies with unmeasured confounding that invalidates the results, designs that cannot answer the research question as stated, or protocols that do not meet the reporting standards expected by target journals.
Phase 3: Biostatistical Planning
The statistical plan is where the methodology becomes concrete and testable. This phase translates the study design into specific hypotheses, determines the appropriate statistical tests, calculates the required sample size, and produces analysis code.
Good biostatistical planning is iterative. The initial power calculation might reveal that the study is infeasible as designed, sending you back to Phase 2 to consider a different design or a more focused population. The choice of statistical test might depend on distributional assumptions that need to be verified with pilot data.
This phase also produces artifacts that reviewers and ethics boards require: the statistical analysis plan (SAP), sample size justification with all assumptions stated, and often R or STATA code that demonstrates the planned analyses.
What goes wrong without it: Underpowered studies, inappropriate statistical tests, analysis plans that do not match the study design, and the all-too-common post hoc rationalization of unexpected results.
Why the Phases Must Connect
The three-phase structure is not just organizational convenience. Each transition is a quality gate.
The gap analysis constrains the methodology: you cannot design a study to fill a gap you have not rigorously defined. The methodology constrains the statistics: your power calculation must match your planned analysis, which must match your study design, which must address the identified gap.
When these phases are disconnected -- different tools, different consultants, different months -- the connections weaken. Assumptions made in Phase 1 get lost by Phase 3. The result is a protocol that looks complete on paper but contains internal contradictions that reviewers will find.
The Case for AI-Assisted Planning
AI cannot replace the domain expertise that drives research planning. But it can do something that human consultants struggle with: maintain perfect consistency across all three phases while checking each decision against established methodological frameworks.
When a researcher tells an AI system that they are studying a rare disease with an expected prevalence of 0.1%, the system can immediately flag that a standard RCT sample size calculation will produce infeasible numbers and suggest alternative designs. When the gap analysis identifies a population subgroup, the methodology phase can automatically consider that subgroup in the design, and the statistics phase can plan the appropriate subgroup analyses.
This is not about replacing methodological thinking. It is about providing a structured environment where that thinking happens more rigorously, more consistently, and earlier in the process -- when changes are cheap rather than expensive.