Overview

This is the theorem that started it all. It says: “You can compress data up to its entropy, but no further.”

Core Idea

Asymptotic Equipartition Property (AEP): For a long sequence of random variables, the outcome will almost certainly belong to a “Typical Set.” We only need to assign codes to the Typical Set. The “Atypical” outcomes are so rare we can ignore them (or use long codes).

Formal Definition (if applicable)

$$ L \ge H(X) $$ The expected length $L$ of a code cannot be less than the entropy $H(X)$.

Intuition

If you are packing a suitcase:

  • Entropy: The volume of your clothes.
  • Compression: Vacuum packing.
  • Theorem: You can vacuum pack the air out, but you can’t make the clothes disappear. There is a minimum size.

Examples

  • English Text: Entropy is ~1.5 bits per letter (if you consider context). Standard ASCII uses 8 bits. This means English can be compressed by ~80%.
  • Coin Flips: If the coin is fair (H=1), you can’t compress it. If it’s biased (H<1), you can.

Common Misconceptions

  • “I found a compression algorithm that compresses every file.” (Impossible. Counting argument: There are fewer short files than long files. You can’t map many long files to few short files uniquely).
  • Rate-Distortion Theory: The limit for lossy compression. (How much quality do you lose for a given size?).
  • Kraft’s Inequality: A condition for a code to be uniquely decodable.

Applications

  • File Formats: ZIP, GZIP, 7z.
  • Communication: Optimizing bandwidth usage.

Criticism / Limitations

It only applies to the limit of infinite block lengths. For short messages, you might not reach the entropy limit.

Further Reading

  • Shannon, A Mathematical Theory of Communication