Overview

Big Data is the industrial revolution of information. We generate more data every two days than in all of history up to 2003. It changes how we do science, business, and government.

Core Idea

The core idea is n=all. Instead of sampling, we can analyze everyone. We don’t need to ask why (causation); we just need to know what (correlation).

Formal Definition

Defined by the 3 Vs:

  1. Volume: Huge amounts of data (Petabytes).
  2. Velocity: Streaming in real-time.
  3. Variety: Messy data (Text, Video, GPS) not just spreadsheets.

Intuition

  • Moneyball: Using stats to find undervalued baseball players.
  • Target: Predicting a teenager was pregnant before her father knew, based on her shopping habits (unscented lotion + vitamins).

Examples

  • Google Flu Trends: Predicting flu outbreaks by search queries. (Failed eventually because the algorithm drifted).
  • Recommendation Engines: Netflix knowing what you want to watch.

Common Misconceptions

  • Misconception: More data = Better decisions.
    • Correction: More data can mean more noise. Without a good model, it’s just “Big Garbage.”
  • Misconception: Data is objective.
    • Correction: Data is collected by humans with biases. Algorithms can be racist/sexist if trained on biased data (Algorithmic Bias).

Applications

  • Smart Cities: Optimizing traffic flow.
  • Genomics: Personalized medicine based on your DNA.

Criticism and Limitations

  • Surveillance Capitalism: Shoshana Zuboff’s critique that our experience is being mined for profit.

Further Reading

  • Big Data by Viktor Mayer-Schönberger and Kenneth Cukier
  • Weapons of Math Destruction by Cathy O’Neil