Overview
Inferential Statistics is the magic of guessing the whole from the part. It allows us to poll 1,000 people and predict how 300 million will vote.
Core Idea
The core idea is generalization. We take a Sample (small group) and infer truths about the Population (everyone), with a calculated margin of error.
Formal Definition
- Null Hypothesis ($H_0$): The default assumption (e.g., “The drug does nothing”).
- Alternative Hypothesis ($H_1$): The interesting theory (e.g., “The drug works”).
- P-Value: The probability of seeing these results if the Null Hypothesis were true. If $p < 0.05$, we reject the Null.
Intuition
- The Jury: The Null Hypothesis is “Innocent until proven guilty.” We need enough evidence (low p-value) to convict.
- Confidence Interval: “We are 95% sure the true number is between X and Y.”
Examples
- Drug Trials: Does the group taking the pill get better faster than the Placebo group? We use a T-test to check if the difference is “Statistically Significant” or just luck.
- Election Polling: “Biden leads by 4 points $\pm$ 3%.”
Common Misconceptions
- Misconception: $p=0.05$ means there is a 95% chance the hypothesis is true.
- Correction: No. It means there is a 5% chance you’d see this data if the hypothesis were false. It’s about the data, not the theory.
- Misconception: Significant means Important.
- Correction: With a huge sample, a tiny difference (0.1%) can be “statistically significant,” but practically useless.
Related Concepts
- Sampling Bias: If your sample is bad (only calling landlines), your inference is trash.
- Bayesian Statistics: An alternative approach to inference.
Applications
- Science: The foundation of the scientific method.
- A/B Testing: Tech companies testing which button color gets more clicks.
Criticism and Limitations
- P-Hacking: Manipulating data until you find a p < 0.05. This has led to the Replication Crisis in psychology and medicine.
Further Reading
- The Signal and the Noise by Nate Silver
- Statistics Done Wrong by Alex Reinhart