Computational Linguistics

Overview

Teaching computers to speak human. From the early days of rule-based systems to the modern era of Deep Learning and LLMs (Large Language Models).

Core Idea

Statistical vs. Symbolic:

Symbolic (Old School): Hand-coding grammar rules. (If Noun + Verb…). Good for precision, bad for ambiguity.
Statistical (Modern): Feeding the computer billions of words and letting it learn the patterns (probabilities). “The cat sat on the…” (Mat: 90%, Hat: 5%).

Formal Definition (if applicable)

Word Embeddings (Vectors): Representing words as numbers in a multi-dimensional space. “King” - “Man” + “Woman” = “Queen”. This allows computers to understand meaning and analogy.

Intuition

Machine Translation: Google Translate. Used to be word-for-word (terrible). Now uses Neural Networks to translate whole sentences (fluent).
Speech Recognition: Siri/Alexa. Turning sound waves into text.

Examples

LLMs (GPT, Gemini): Predicting the next word. It turns out if you do this well enough, you get reasoning, coding, and poetry.
Sentiment Analysis: Reading tweets to see if people are happy or angry about a movie.

Common Misconceptions

“The AI understands.” (It manipulates symbols/probabilities. Whether it has “understanding” is a philosophical debate).
“It’s solved.” (Still struggles with sarcasm, deep context, and hallucination).

Turing Test: Can a machine fool a human?
Corpus Linguistics: Analyzing huge databases of text to see how language is actually used.

Applications

Search: Google.
Accessibility: Screen readers for the blind.
Customer Service: Chatbots.

Criticism / Limitations

Bias. If the training data is biased (sexist/racist), the AI will be too.

Overview#

Core Idea#

Formal Definition (if applicable)#

Intuition#

Examples#

Common Misconceptions#

Related Concepts#

Applications#

Criticism / Limitations#

Further Reading#