Overview

Descriptive Statistics is about summarizing data. It takes a messy spreadsheet of 10,000 numbers and turns it into a few useful numbers (like an Average) or a picture (Graph).

Core Idea

The core idea is simplification. We trade detail for understanding. We lose the individual data points to see the “shape” of the data.

Formal Definition

  • Central Tendency: Where is the middle? (Mean, Median, Mode).
  • Dispersion: How spread out is it? (Range, Variance, Standard Deviation).

Intuition

  • Mean (Average): Add them up and divide. Sensitive to outliers (Bill Gates walks into a bar, everyone becomes a millionaire on average).
  • Median: The middle number. Robust to outliers. (Better for income).
  • Standard Deviation: The average distance from the mean. High SD = Spread out (Wild). Low SD = Clumped (Consistent).

Examples

  • Bell Curve (Normal Distribution): Most things (Height, IQ) cluster around the mean.
  • Skew: When data leans to one side (Income is right-skewed; most are poor, a few are super rich).

Common Misconceptions

  • Misconception: Average is “Normal.”
    • Correction: In a bimodal distribution (two humps, like shoe sizes for men and women), the “average” might be a size nobody wears.
  • Misconception: Correlation implies Causation.
    • Correction: Just because two lines go up together doesn’t mean one causes the other.

Applications

  • Sports: Batting averages.
  • Business: Quarterly reports.

Criticism and Limitations

  • Lying with Statistics: You can choose the measure (Mean vs. Median) that supports your argument.

Further Reading

  • How to Lie with Statistics by Darrell Huff
  • Naked Statistics by Charles Wheelan