Z-score

Overview

A Z-score (standard score) tells you exactly where a data point fits into a distribution. It allows you to compare apples to oranges by standardizing different scales onto a common metric.

Core Idea

It answers: “How weird is this value?”

Z = 0: Exactly average.
Z = +1: One standard deviation above average.
Z = -2: Two standard deviations below average.

Formal Definition (if applicable)

$$ Z = \frac{x - \mu}{\sigma} $$

Where $x$ is the raw score, $\mu$ is the population mean, and $\sigma$ is the population standard deviation.

Intuition

If you scored 80 on a math test and 80 on a history test, which was better?

Math: Mean=70, SD=10. $Z = (80-70)/10 = +1.0$.
History: Mean=60, SD=5. $Z = (80-60)/5 = +4.0$. You did vastly better in history, because an 80 was 4 standard deviations above the mean (extremely rare), whereas in math it was just somewhat above average.

Examples

Standardized Testing: SAT and IQ scores are often normalized so percentiles can be calculated from Z-scores.
Outlier Detection: A common rule of thumb is that any data point with a Z-score greater than +3 or less than -3 is an outlier.

Common Misconceptions

“Z-scores require normal distribution”: You can calculate a Z-score for any distribution, but the probabilities associated with Z-scores (like “95% are within +/- 1.96”) only apply if the distribution is normal.

Normal Distribution
Standard Deviation
Percentiles
Standardization

Applications

Used in data preprocessing for machine learning (normalization), comparing scores from different datasets, and identifying outliers.

Criticism / Limitations

Z-scores are sensitive to the mean and standard deviation, which themselves are sensitive to outliers. In skewed distributions, Z-scores may be misleading.

Overview#

Core Idea#

Formal Definition (if applicable)#

Intuition#

Examples#

Common Misconceptions#

Related Concepts#

Applications#

Criticism / Limitations#

Further Reading#