Overview

Hypothesis testing is the backbone of scientific research and data-driven decision making. It provides a formal framework to determine whether observed patterns in data are likely real or just random noise.

Core Idea

The core idea is to assume nothing interesting is happening (the Null Hypothesis) and then see if the data is so unusual that this assumption is likely false. It’s like a criminal trial: the defendant (Null Hypothesis) is presumed innocent until proven guilty (rejected) by sufficient evidence (data).

Formal Definition (if applicable)

A statistical test that uses sample data to decide between two competing hypotheses:

$$ H_0: \text{Null Hypothesis (Status Quo)} $$ $$ H_1: \text{Alternative Hypothesis (Claim)} $$

Intuition

If you flip a coin 100 times and get 52 heads, you wouldn’t suspect it’s rigged. If you get 95 heads, you would. Hypothesis testing quantifies exactly where the line is between “plausible luck” and “suspiciously biased.”

Examples

  • A/B Testing: Testing if a new website design ($H_1$) leads to more clicks than the old one ($H_0$).
  • Drug Trials: Testing if a new medicine ($H_1$) cures more patients than a placebo ($H_0$).

Common Misconceptions

  • “Proving” the Null: You never “accept” or “prove” the null hypothesis; you only “fail to reject” it. Absence of evidence is not evidence of absence.
  • p=0.05 is magic: The 0.05 threshold is arbitrary convention, not a law of nature.

Applications

Used everywhere from clinical trials and psychology experiments to manufacturing quality assurance and marketing analytics.

Criticism / Limitations

Frequentist hypothesis testing is often criticized for its reliance on arbitrary thresholds (p-hacking) and for not telling us the probability that the hypothesis is actually true (which Bayesian methods do).

Further Reading

  • “The Lady Tasting Tea” by David Salsburg
  • ASA Statement on p-Values