Module 3 · Confidence Intervals

The drug works — but by how much?

A trial reports: "Blood pressure fell by 9 mmHg." Should you be impressed? It depends entirely on the uncertainty around that 9.

Before we go further — what do you think a 95% confidence interval actually means?

From a single guess to an honest range

Every trial gives you a point estimate — one number. But that number came from a sample, and samples vary (sampling variation). The question is: how much could the true effect differ from what we measured?

The answer lies in the standard error: the typical distance between a sample mean and the population mean. A 95% CI stretches roughly ±2 standard errors either side of the estimate — capturing the range of values the data are still consistent with.

95% CI ≈ estimate ± 2 × SE

Wider SE → wider CI → more uncertainty. Larger N → smaller SE → tighter CI.

Build the interval yourself

A trial of 100 patients reports a mean fall in systolic BP of 9 mmHg. The standard deviation is 30 mmHg, so the standard error is 30 ÷ √100 = 3 mmHg.

SE= SD ÷ √N = 30 ÷ 10 = 3 mmHg

Now calculate the 95% CI step by step:

Margin of error = 2 × SE = 2 × 3 = ?
Lower bound = 9 − 6 = ?

See what sample size does to a CI

The true effect is fixed at +6 mmHg. Drag the slider from small samples to large ones and watch the interval shrink.

Sample size: N = 25

95% CI: [-2.0, 14.0] mmHg

Width: 16.0 mmHg · Crosses zero — not significant

Try both ends — a tiny trial and a huge one. Notice when the CI clears zero.

CI and p-value — two sides of the same coin

The p-value asks: "Is this result compatible with no effect?" The CI asks: "What range of effects is the data compatible with?" They're deeply linked:

p-value

95% CI

p < 0.05

CI excludes zero

p > 0.05

CI includes zero

One number (p)

Full range of plausible effects

Doesn't show size

Shows size and uncertainty

Most regulators and HTA bodies now prefer CIs — because a drug that "barely reaches p < 0.05" with a CI of [0.1, 40] tells a very different story from one with CI of [8, 12].

The zero line is the decision point

For a treatment vs control comparison, zero means "no difference." Whether the CI crosses zero is the key question:

CI entirely above zero — consistent only with benefit → statistically significant.
CI crosses zero — no-effect values are still plausible → not significant. Could be a real effect obscured by noise, or genuinely no effect.
CI entirely below zero — consistent only with harm → significant in the wrong direction.

But significance isn't everything. A CI of [0.01, 40] excludes zero — technically significant — yet the lower bound is clinically irrelevant and the upper bound is implausibly large. The CI tells you there is noise, not a reliable answer.

Read the forest plot

Each round shows a CI on a difference axis. Decide what it tells you.

What "95%" really means

Here is the precise statement: if you ran the same study 100 times, about 95 of the resulting confidence intervals would contain the true value. The other 5 would miss it entirely — and you can't tell which one you got.

Common error: "There is a 95% chance the true effect is in this interval." Wrong — the true effect is fixed; it's either in the interval or it isn't. The 95% refers to the procedure, not this particular interval.

In practice, "95% CI" is a useful shorthand for "a range that captures the estimate's uncertainty under this study design." For HTA purposes, the key questions are: Does it exclude zero? Is the entire range clinically meaningful? Is it narrow enough to be useful?

Why this matters for HTA

NICE, EMA, and most HTA bodies require CIs — not just p-values — because the size of the effect determines cost-effectiveness, not just its direction.
A CI that crosses the minimum clinically important difference (MCID) signals that the benefit may be too small to justify the price, even if p < 0.05.
Economic models propagate the CI into cost-effectiveness uncertainty: a wide CI means a wide cost-per-QALY range, which means a less confident recommendation.

Confidence intervals, in one breath

A CI is the estimate ± about 2 standard errors — the range of true effects consistent with the data.
Larger N → smaller SE → narrower CI → more precise answer.
If the CI excludes zero, the result is statistically significant — but check the whole range, not just the threshold.
The "95%" refers to the long-run reliability of the procedure, not the probability that this specific interval is correct.
HTA needs both significance (CI excludes zero) and clinical size (CI entirely above the MCID).

"A drug's CI of [0.1, 40 mmHg] is technically significant but scientifically uninformative — the data are too noisy to say anything useful."

Next, we'll see how power determines whether a trial is even capable of producing a tight-enough CI to be useful — and what happens when it isn't.