Overview
To a computer, a photo is just a grid of numbers (pixels). It doesn’t know that the blob of pixels in the middle is a cat. Computer Vision (CV) is the science of giving computers eyes. It allows self-driving cars to see pedestrians and Facebook to tag your face.
Core Idea
The core idea is Pattern Recognition. Using Convolutional Neural Networks (CNNs) to scan the image for features.
- Layer 1 sees edges (lines).
- Layer 2 sees shapes (circles).
- Layer 3 sees objects (eyes).
- Layer 4 sees faces.
Formal Definition
The field that deals with how computers can gain high-level understanding from digital images or videos.
Intuition
- Human: You see a dog instantly.
- Computer: It sees a matrix of Red, Green, and Blue values. It has to do millions of calculations to realize that “Pointy Ears” + “Fur Texture” + “Snout” = Dog.
Examples
- Self-Driving Cars: Tesla Autopilot. It uses cameras to track lane lines, read stop signs, and avoid hitting kids.
- Medical Imaging: AI is now better than human doctors at spotting tumors in X-rays. It never gets tired or distracted.
- Deepfakes: Using CV to swap faces in a video. You can make the President say anything. Scary stuff.
Common Misconceptions
- It’s solved: It’s still easily fooled. If you put a sticker on a stop sign, a self-driving car might think it’s a speed limit sign and crash (Adversarial Attack).
Related Concepts
- OCR (Optical Character Recognition): Reading text from images (scanning a PDF).
- Biometrics: Unlocking your phone with your face (FaceID).
Applications
- Retail: Amazon Go stores. Cameras track what you pick off the shelf and charge you automatically. No checkout lines.
- Agriculture: Drones flying over fields, counting every single plant and spotting weeds.
Criticism / Limitations
- Surveillance: Facial recognition is ending privacy. Governments can track everyone, everywhere. (Banned in some cities like San Francisco).
Further Reading
- Szeliski, Richard. Computer Vision: Algorithms and Applications.
- Mitchell, Melanie. Artificial Intelligence: A Guide for Thinking Humans.