Overview
Sampling is the art of tasting the soup. You don’t need to drink the whole pot to know if it’s salty; you just need a spoonful. But it has to be a representative spoonful.
Core Idea
The core idea is representativeness. The sample must look like the population. If you only survey rich people, you can’t predict the behavior of the poor.
Formal Definition
- Simple Random Sampling: Drawing names from a hat. Every member has an equal chance. The gold standard.
- Stratified Sampling: Dividing the population into groups (Strata) and sampling from each (e.g., ensuring you have 50% men and 50% women).
Intuition
- The Soup: If you don’t stir the soup (randomize), you might just get a spoonful of broth and think there are no noodles.
- Dewey Defeats Truman: The famous 1948 newspaper headline that was wrong because the pollsters called people on telephones (who were richer and more Republican).
Examples
- Selection Bias: Online polls. Only people who care enough to visit the site vote.
- Survivorship Bias: Analyzing only the planes that came back from war (and reinforcing the parts with holes), ignoring the ones that were shot down (and had holes in the engines).
Common Misconceptions
- Misconception: Bigger sample is always better.
- Correction: A large biased sample is worse than a small random sample. (2 million biased votes < 1,000 random votes).
- Misconception: You need to sample 10% of the population.
- Correction: The math of sampling depends on absolute numbers, not percentage. A sample of 1,000 is usually enough for any population size (even millions).
Related Concepts
- Inferential Statistics: Relies on sampling.
- Bias: Systematic error.
Applications
- Market Research: Focus groups.
- Quality Control: Testing 1 in 100 cars off the line.
Criticism and Limitations
- Non-Response Bias: People don’t answer their phones anymore. Polling is getting harder.
Further Reading
- Sampling: Design and Analysis by Sharon Lohr