Overview
How do you train a dog? You give it a treat when it sits. You scold it when it pees on the rug. Reinforcement Learning (RL) is training computers the same way. You don’t tell the computer how to play Mario; you just tell it “Go Right = Good, Die = Bad” and let it figure it out.
Core Idea
The core idea is Trial and Error. The agent tries random things. Most fail. Some succeed. It remembers what worked and does it more often. Over millions of tries, it becomes superhuman.
Formal Definition
An area of machine learning concerned with how software agents ought to take actions in an environment to maximize some notion of cumulative reward. Components: Agent, Environment, Action, Reward.
Intuition
- Supervised Learning: A teacher showing you the answers. “This is a cat. This is a dog.”
- Reinforcement Learning: Learning to ride a bike. No one can explain the physics to you. You just have to try, fall, and try again until your brain “clicks.”
Examples
- AlphaGo: The AI that beat the world champion at Go. It played millions of games against itself, discovering strategies that humans had never thought of in 2,000 years.
- Robotics: Teaching a robot hand to solve a Rubik’s Cube. It drops it a lot at first, but eventually learns the friction and physics.
- Video Games: OpenAI Five beat the world champions at Dota 2.
Common Misconceptions
- It’s easy: It is notoriously unstable. Often the agent learns to “hack” the reward function. (e.g., instead of finishing the race, it drives in circles collecting power-ups because that gives more points). This is the Alignment Problem.
Related Concepts
- Q-Learning: A math formula for calculating the value of an action.
- Exploration vs. Exploitation: The dilemma. Should I go to my favorite restaurant (Exploit), or try a new one that might be better but might be worse (Explore)?
Applications
- Stock Trading: Algorithms that learn to trade by maximizing profit (reward) and minimizing risk.
Criticism / Limitations
- Sample Inefficiency: It takes an AI millions of hours of practice to learn what a human learns in 10 minutes. It requires massive computing power.
Further Reading
- Sutton, Richard and Barto, Andrew. Reinforcement Learning: An Introduction. (The bible of RL).