Overview
A Z-score (standard score) tells you exactly where a data point fits into a distribution. It allows you to compare apples to oranges by standardizing different scales onto a common metric.
Core Idea
It answers: “How weird is this value?”
- Z = 0: Exactly average.
- Z = +1: One standard deviation above average.
- Z = -2: Two standard deviations below average.
Formal Definition (if applicable)
$$ Z = \frac{x - \mu}{\sigma} $$
Where $x$ is the raw score, $\mu$ is the population mean, and $\sigma$ is the population standard deviation.
Intuition
If you scored 80 on a math test and 80 on a history test, which was better?
- Math: Mean=70, SD=10. $Z = (80-70)/10 = +1.0$.
- History: Mean=60, SD=5. $Z = (80-60)/5 = +4.0$. You did vastly better in history, because an 80 was 4 standard deviations above the mean (extremely rare), whereas in math it was just somewhat above average.
Examples
- Standardized Testing: SAT and IQ scores are often normalized so percentiles can be calculated from Z-scores.
- Outlier Detection: A common rule of thumb is that any data point with a Z-score greater than +3 or less than -3 is an outlier.
Common Misconceptions
- “Z-scores require normal distribution”: You can calculate a Z-score for any distribution, but the probabilities associated with Z-scores (like “95% are within +/- 1.96”) only apply if the distribution is normal.
Related Concepts
- Normal Distribution
- Standard Deviation
- Percentiles
- Standardization
Applications
Used in data preprocessing for machine learning (normalization), comparing scores from different datasets, and identifying outliers.
Criticism / Limitations
Z-scores are sensitive to the mean and standard deviation, which themselves are sensitive to outliers. In skewed distributions, Z-scores may be misleading.
Further Reading
- Statistics textbooks on standardization