Module 3 · Confidence Intervals
The drug works — but by how much?
A trial reports: "Blood pressure fell by 9 mmHg." Should you be impressed? It depends entirely on the uncertainty around that 9.
Before we go further — what do you think a 95% confidence interval actually means?
From a single guess to an honest range
Every trial gives you a point estimate — one number. But that number came from a sample, and samples vary (sampling variation). The question is: how much could the true effect differ from what we measured?
The answer lies in the standard error: the typical distance between a sample mean and the population mean. A 95% CI stretches roughly ±2 standard errors either side of the estimate — capturing the range of values the data are still consistent with.
95% CI ≈ estimate ± 2 × SE
Wider SE → wider CI → more uncertainty. Larger N → smaller SE → tighter CI.
Build the interval yourself
A trial of 100 patients reports a mean fall in systolic BP of 9 mmHg. The standard deviation is 30 mmHg, so the standard error is 30 ÷ √100 = 3 mmHg.
Now calculate the 95% CI step by step:
Margin of error = 2 × SE = 2 × 3 = ?
Lower bound = 9 − 6 = ?
So the 95% CI is [3, 15] mmHg.
That means the data are consistent with anything from a barely-noticeable 3 mmHg drop to a substantial 15 mmHg drop. Impressive or not? You can't tell from the point estimate alone.
See what sample size does to a CI
The true effect is fixed at +6 mmHg. Drag the slider from small samples to large ones and watch the interval shrink.
95% CI: [-2.0, 14.0] mmHg
Width: 16.0 mmHg · Crosses zero — not significant
Try both ends — a tiny trial and a huge one. Notice when the CI clears zero.
CI and p-value — two sides of the same coin
The p-value asks: "Is this result compatible with no effect?" The CI asks: "What range of effects is the data compatible with?" They're deeply linked:
Most regulators and HTA bodies now prefer CIs — because a drug that "barely reaches p < 0.05" with a CI of [0.1, 40] tells a very different story from one with CI of [8, 12].
The zero line is the decision point
For a treatment vs control comparison, zero means "no difference." Whether the CI crosses zero is the key question:
- CI entirely above zero — consistent only with benefit → statistically significant.
- CI crosses zero — no-effect values are still plausible → not significant. Could be a real effect obscured by noise, or genuinely no effect.
- CI entirely below zero — consistent only with harm → significant in the wrong direction.
But significance isn't everything. A CI of [0.01, 40] excludes zero — technically significant — yet the lower bound is clinically irrelevant and the upper bound is implausibly large. The CI tells you there is noise, not a reliable answer.
Read the forest plot
Each round shows a CI on a difference axis. Decide what it tells you.
What "95%" really means
Here is the precise statement: if you ran the same study 100 times, about 95 of the resulting confidence intervals would contain the true value. The other 5 would miss it entirely — and you can't tell which one you got.
Common error: "There is a 95% chance the true effect is in this interval." Wrong — the true effect is fixed; it's either in the interval or it isn't. The 95% refers to the procedure, not this particular interval.
In practice, "95% CI" is a useful shorthand for "a range that captures the estimate's uncertainty under this study design." For HTA purposes, the key questions are: Does it exclude zero? Is the entire range clinically meaningful? Is it narrow enough to be useful?
Why this matters for HTA
- NICE, EMA, and most HTA bodies require CIs — not just p-values — because the size of the effect determines cost-effectiveness, not just its direction.
- A CI that crosses the minimum clinically important difference (MCID) signals that the benefit may be too small to justify the price, even if p < 0.05.
- Economic models propagate the CI into cost-effectiveness uncertainty: a wide CI means a wide cost-per-QALY range, which means a less confident recommendation.
Confidence intervals, in one breath
- A CI is the estimate ± about 2 standard errors — the range of true effects consistent with the data.
- Larger N → smaller SE → narrower CI → more precise answer.
- If the CI excludes zero, the result is statistically significant — but check the whole range, not just the threshold.
- The "95%" refers to the long-run reliability of the procedure, not the probability that this specific interval is correct.
- HTA needs both significance (CI excludes zero) and clinical size (CI entirely above the MCID).
"A drug's CI of [0.1, 40 mmHg] is technically significant but scientifically uninformative — the data are too noisy to say anything useful."
Next, we'll see how power determines whether a trial is even capable of producing a tight-enough CI to be useful — and what happens when it isn't.