Overview
Computers speak math (010101). Humans speak messy, ambiguous, slang-filled language (“Yeah, right”). Natural Language Processing (NLP) is the bridge. It teaches computers to read, write, and understand human speech. It is the brain behind Siri, Google Translate, and ChatGPT.
Core Idea
The core idea is Vectorization. Turning words into numbers.
- Old NLP: Rules. “If word is ‘cat’, then noun.” (Failed because language has too many exceptions).
- New NLP (Deep Learning): Statistics. “The word ‘cat’ appears near ‘meow’ 90% of the time.” The computer learns the meaning of a word by its context.
Formal Definition
The interaction between computers and human (natural) languages. Key Tasks: Translation, Sentiment Analysis, Speech Recognition.
Intuition
- Word Embeddings: Imagine a 3D map. The word “King” is at coordinate (5, 3, 9). The word “Queen” is at (5, 3, 1). They are close together. The math equation: King - Man + Woman = Queen. The computer “understands” gender by doing math on the coordinates.
Examples
- ChatGPT (LLM): A Large Language Model. It is basically a super-advanced “Autocorrect.” It predicts the next word in a sentence based on reading the entire internet.
- Spam Filters: The original NLP. “If email contains ‘Viagra’ and ‘Prince’, put in trash.”
- Sentiment Analysis: Wall Street uses it to read millions of tweets. If everyone is tweeting “I love Apple,” the algorithm buys Apple stock automatically.
Common Misconceptions
- It understands like a human: No, it is a “Stochastic Parrot.” It knows which words go together, but it doesn’t have a model of the world. It doesn’t know what a cat is; it just knows how the word “cat” is used.
Related Concepts
- Turing Test: Can a machine fool a human into thinking it is human? (ChatGPT has basically passed this).
- Tokenization: Chopping text into pieces (tokens) for the computer to digest.
Applications
- Translation: Breaking the language barrier. You can talk to anyone, anywhere.
- Accessibility: Automatic captions for the deaf. Screen readers for the blind.
Criticism / Limitations
- Bias: If you train the AI on the internet, it learns the internet’s racism and sexism. “Garbage in, garbage out.”
- Hallucination: LLMs often confidently state facts that are completely wrong.
Further Reading
- Jurafsky, Dan. Speech and Language Processing.
- Christian, Brian. The Alignment Problem.