Module 3 · Diagnostic Tests

The test came back positive. What are the odds it's right?

A company submits a new biomarker test for rare-disease screening. It reports 99% sensitivity and 99% specificity. An appraisal committee member asks: "If the test is positive, how likely is the patient to actually have the disease?"

The answer isn't on the label. It never is.

A test has 99% sensitivity and 99% specificity. After a positive result, how likely is the patient to have the disease?

Sensitivity and specificity are the test's intrinsic performance — they don't move with the patient. But what a positive result means absolutely does. That gap between test performance and clinical meaning is the whole story of this lesson.

Four cells that contain all of diagnostics

Every diagnostic statistic you will ever read comes from four numbers arranged in a 2×2 table — test result by disease status:

Test positive
Test negative
Disease present
TPTrue positive
FNFalse negative — missed
Disease absent
FPFalse positive — false alarm
TNTrue negative

The first two (rows) describe the test. The last two (columns) describe what the result means for a patient. They're related but not the same — and only one pair depends on how common the disease is.

Sensitivity and specificity — calculate them

Sensitivity asks: of all the sick people, how many did the test catch? Specificity asks: of all the healthy people, how many did the test correctly clear?

Here is a worked dataset. Fill in both values:

Test results in 200 patients (100 diseased, 100 healthy):

  1. Of 100 truly diseased patients: 90 test positive (TP = 90), 10 test negative (FN = 10). Sensitivity = TP ÷ (TP + FN). Enter the percentage (no % sign).

  2. Of 100 truly healthy patients: 5 test positive (FP = 5), 95 test negative (TN = 95). Specificity = TN ÷ (TN + FP). Enter the percentage (no % sign).

Sensitivity 90%, specificity 95%. Two mnemonics worth remembering: SnOut — a highly Snensitive test rules disease Out (few missed cases). SpIn — a highly Specific test rules disease In (few false alarms).

Moving the threshold changes both

Sensitivity and specificity aren't fixed properties of the test — they depend on where you draw the line between "positive" and "negative."

Below, two overlapping distributions: lower values for healthy people, higher values for diseased. The vertical line is the threshold. Move it left (more inclusive) or right (more exclusive) and watch the trade-off:

Threshold:55
Sensitivity: 84%Specificity: 84%

Test-value distributions

020406080100Test value→ positiveHealthyDiseased

ROC curve

000.50.5111 − SpecificitySensitivityAUC ≈ 0.92
TPFPFNTN
Move the slider to the far left and the far right to unlock the next screen.

This trade-off — sensitivity versus specificity — is exactly what the ROC curve visualises. Each point on the ROC is a different threshold choice. The area under the curve (AUC) summarises the test's overall ability to separate diseased from healthy, independent of any particular threshold.

When does "positive" mean positive?

Sensitivity and specificity describe the test. But a clinician — and an HTA analyst — needs to answer a different question: given a positive result, how likely is the patient to be sick?

That's PPV (positive predictive value): the fraction of positives that are true positives.

Same test. Two very different settings.

Sens = 99%, spec = 99%. A population of 1 000 people.

Setting A — disease present in 50% (500 diseased):
TP = 495, FP = 5 → PPV = 495 / 500 = 99%

Setting B — disease present in 0.1% (1 diseased):
TP = 1 (approx), FP ≈ 10 → PPV = 1 / 11 ≈ 9%

The test didn't change. The population did. When the disease is rare, almost every positive result is a false alarm — even with a near-perfect test.

Sensitivity and specificity don't depend on prevalence. PPV does. This is the most important insight in diagnostic testing.

The base-rate problem

Imagine a test for a rare genetic condition affecting 1 in 1 000 people (prevalence = 0.1%). The test is excellent: 99% sensitive, 99% specific.

This isn't a test failure — it's arithmetic. The rarer the disease, the larger the false-positive pool swamps the true-positive signal. No amount of improving sensitivity or specificity from 99% will fix it if the disease is rare enough.

Committees reviewing screening programmes ask about this explicitly. A cost-effective test in high-risk subgroups can become a wasteful cascade of anxiety and follow-up in general populations.

See the base-rate problem for yourself

Below, 1 000 people. The test is fixed at 99% sensitivity and 99% specificity throughout. Only the prevalence changes. Move the slider all the way down to 0.1% — and then all the way up to 50%:

Sensitivity: 99%|Specificity: 99%|Population: 1 000 people
1%
PPV50%

Of 20 positive tests,10 are true cases.

TP: 10FP: 10FN: 0TN: 980
Drag the slider to ≤ 1% and ≥ 25% to unlock the next screen.

At 0.1% prevalence: 1 teal square (TP) in a sea of 10 amber squares (FP). PPV ≈ 9%. At 50% prevalence: the teal nearly fills the left half. PPV ≈ 99%. The test is identical. This is why the context of use — the population being tested — is inseparable from what a positive result means.

Likelihood ratios: the prevalence-free update

PPV varies with prevalence — inconvenient when you need to compare a test across settings. Likelihood ratios (LR) solve this. They express how much a result shifts the pre-test probability — and they don't depend on prevalence.

LR+ = sensitivity ÷ (1 − specificity)

LR+ answers: "How many times more likely is this positive result in someone with the disease than in someone without?"

LR− = (1 − sensitivity) ÷ specificity

LR− answers: "How many times more likely is this negative result in someone with the disease than in someone without?" (You want this small.)

Rules of thumb: LR+ > 10 substantially raises the probability; LR− < 0.1 substantially lowers it.

Now compute LR+ from the sens/spec values you calculated earlier (90% sensitivity, 95% specificity):

  1. LR+ = sensitivity ÷ (1 − specificity). Using 90% sensitivity and 95% specificity from above: what is LR+? Enter a whole number.

LR+ = 18 — a positive result is 18× more likely in someone with the disease than without. That is a genuinely informative test. An LR+ of 2 or 3 (barely informative) looks very different — and the threshold-vs-ROC plot you explored earlier shows why: as you move toward either extreme, LR changes dramatically.

Four claims — what's right?

For each scenario, choose the best response. All four questions unlock the next screen.

Loading…

Why this matters for HTA

Diagnostic tests rarely arrive on a committee desk alone. They arrive embedded in a test–treat pathway: test, then decide, then treat (or not). HTA evaluates the full path, not the test in isolation.

"A 99% sensitive test in a 0.1% prevalence population does not have a 99% accuracy problem. It has a population problem. HTA committees are paid to know the difference."

Diagnostic tests, in one breath

"The test is the same; the population is the question. Specificity is for ruling in; sensitivity is for ruling out. And PPV is the number the patient actually lives with."

M3 is now complete. You've moved from raw variation (Module 3 opener) through p-values, confidence intervals, effect measures, survival analysis, hazard ratios, and now diagnostic accuracy. Module 4 will introduce systematic reviews — how to combine evidence across studies and detect when the literature itself misleads.