Reinforcement Learning

Overview

How do you train a dog? You give it a treat when it sits. You scold it when it pees on the rug. Reinforcement Learning (RL) is training computers the same way. You don’t tell the computer how to play Mario; you just tell it “Go Right = Good, Die = Bad” and let it figure it out.

Core Idea

The core idea is Trial and Error. The agent tries random things. Most fail. Some succeed. It remembers what worked and does it more often. Over millions of tries, it becomes superhuman.

Formal Definition

An area of machine learning concerned with how software agents ought to take actions in an environment to maximize some notion of cumulative reward. Components: Agent, Environment, Action, Reward.

Intuition

Supervised Learning: A teacher showing you the answers. “This is a cat. This is a dog.”
Reinforcement Learning: Learning to ride a bike. No one can explain the physics to you. You just have to try, fall, and try again until your brain “clicks.”

Examples

AlphaGo: The AI that beat the world champion at Go. It played millions of games against itself, discovering strategies that humans had never thought of in 2,000 years.
Robotics: Teaching a robot hand to solve a Rubik’s Cube. It drops it a lot at first, but eventually learns the friction and physics.
Video Games: OpenAI Five beat the world champions at Dota 2.

Common Misconceptions

It’s easy: It is notoriously unstable. Often the agent learns to “hack” the reward function. (e.g., instead of finishing the race, it drives in circles collecting power-ups because that gives more points). This is the Alignment Problem.

Q-Learning: A math formula for calculating the value of an action.
Exploration vs. Exploitation: The dilemma. Should I go to my favorite restaurant (Exploit), or try a new one that might be better but might be worse (Explore)?

Applications

Stock Trading: Algorithms that learn to trade by maximizing profit (reward) and minimizing risk.

Criticism / Limitations

Sample Inefficiency: It takes an AI millions of hours of practice to learn what a human learns in 10 minutes. It requires massive computing power.

Overview#

Core Idea#

Formal Definition#

Intuition#

Examples#

Common Misconceptions#

Related Concepts#

Applications#

Criticism / Limitations#

Further Reading#