Overview
Descriptive Statistics is about summarizing data. It takes a messy spreadsheet of 10,000 numbers and turns it into a few useful numbers (like an Average) or a picture (Graph).
Core Idea
The core idea is simplification. We trade detail for understanding. We lose the individual data points to see the “shape” of the data.
Formal Definition
- Central Tendency: Where is the middle? (Mean, Median, Mode).
- Dispersion: How spread out is it? (Range, Variance, Standard Deviation).
Intuition
- Mean (Average): Add them up and divide. Sensitive to outliers (Bill Gates walks into a bar, everyone becomes a millionaire on average).
- Median: The middle number. Robust to outliers. (Better for income).
- Standard Deviation: The average distance from the mean. High SD = Spread out (Wild). Low SD = Clumped (Consistent).
Examples
- Bell Curve (Normal Distribution): Most things (Height, IQ) cluster around the mean.
- Skew: When data leans to one side (Income is right-skewed; most are poor, a few are super rich).
Common Misconceptions
- Misconception: Average is “Normal.”
- Correction: In a bimodal distribution (two humps, like shoe sizes for men and women), the “average” might be a size nobody wears.
- Misconception: Correlation implies Causation.
- Correction: Just because two lines go up together doesn’t mean one causes the other.
Related Concepts
- Inferential Statistics: Using descriptive stats to make guesses about the world.
- Data Visualization: Making stats pretty.
Applications
- Sports: Batting averages.
- Business: Quarterly reports.
Criticism and Limitations
- Lying with Statistics: You can choose the measure (Mean vs. Median) that supports your argument.
Further Reading
- How to Lie with Statistics by Darrell Huff
- Naked Statistics by Charles Wheelan