In October 2015, AlphaGo, an AI-powered system, beat (5-0 win) Mr. Fan Hui, the reigning 3-times European Champion. It was the first time an AI conquered a human in a game so sophisticated as the game of Go.

This article, after a short introduction to Go game and AlphaGo's achievements, aims to briefly describe in a technical manner the mechanism of Reinforcement Learning (RL) that was applied, among others, in AlphaGo and list practical examples of RL in the industry and business.

The game of Go

Go is an abstract strategy board game for two players invented in ancient China more than 2,500 years ago. Its aim is to use "stones" (black or white) to surround more territory than the opponent.

The rules of the game are very simple. However, the variations of the placements of the stones outnumber the total number of atoms in the visible universe (1080 atoms vs 10761 possible games of Go). Instinct, intuition and feel play an important role in the game, and because of its intellectual beauty, it has captured the attention of civilizations since centuries.

Go board

Was it just an one-time event?

Definitively it wasn't! After the victory with Mr. Fan Hui, AlphaGo played with Mr. Lee Sedol (winner of 18 world titles, recognized as the greatest Go player in the decade) in Seoul, South Korea. The result? 4 to 1 for the machine.

It shocked professional associations worldwide and AlphaGo was granted an honorary 9-dan ranking (the highest possible certification) - a title previously reserved only for world's top Go Champions.

Lee Sedo during the battle with AlphaGo

What about chess?

Chess has many similarities to Go. Both stem from ancient times and are played by two people taking turns. Also, there is no random element involved.

But - unlike Go - Chess champion (Garry Kasparov) was defeated by a software program (Deep Blue) carefully designed with collaboration between software engineers and chess grandmasters more than two decades ago (in 1997). But, why as much as 20 years were required to beat a Go world champion?

This is because of a profound difference in complexity between these games. In Go, during every turn, a player has much more possible moves to chose from (about 250 in Go vs. 35 in chess).

So, how exactly were these difficulties and complexities tackled?

Deep Reinforcement Learning to the rescue

Have you ever trained a dog to teach him a new trick? How does that process could look like? You give him a reward if the trick is performed correctly and could punish him if it's not.

Say, every time a dog messes up a living room, you take away some of his favorite food or every time he returns with a stick that you've thrown out, you give him a sugar cube.

What will the dog eventually learn?

The agent (dog) will behave in a way to maximize the amount of his favorite food you give him as a feedback. His biological "neural network" will invent the correct patterns itself. It's a very powerful tool and it's ingrained in nearly every biological unit.

That's a high-level real-world similarity of reinforcement learning. It constantly relies on the feedback from the environment.
The machine can be rewarded or punished based on the current state of the actions it performed, or based on how fast it's able to reach the desired state from a current position. The longer the distance (be it a physical distance, a fraction of time or just an abstract distance), the reward can be discounted by a certain factor.

One of the recent, spectacular achievements related to RL is that OpenAI built RL system in 2 weeks that beats world-class players in Dota 2. The system was not taught the rules of Dota, nor does it contain any hardcoded game-based rules. It started by randomly traversing a map or standing still and slowly improved itself, baby-stepping its way. In other words: it played so many games that no human was ever capable of playing.

Where else is Reinforcement Learning used?

There are many areas where Reinforcement Learning is applied. Just to name a few:

  • In robotics - to efficiently find a combination of electrical signals to steer robotic arms (to perform an action) or legs (to walk)
  • Logic games - apart from Go: Poker, Back-Gammon, Othello/Reversi, Checkers, Solitaire
  • In manufacturing - robots for package transportation or for assembling specific parts of cars
  • In military - among others for logistics and to provide automatic assistance for humans in analyzing the environment before actions
  • In inventory management - to reduce transit time and space utilization, or to optimize dispatching rules
  • Power systems - to predict and minimize transmission losses
  • In the financial sector - for instance in the trading systems to generate alpha or to serve as an assistant to allow traders and analysts to save time
  • For steering autonomous helicopter

Reinforcement Learning has found multiple uses in business. Perhaps you're also interested in practical Machine Learning applications for trading? Read our article!

Finally - it was successfully used to play Atari games (e.g. PacMan, Tetris). Interestingly enough, it quickly gained the skills and outperformed human experts on three of the games described in the experiment.

What other materials can I read to learn the basis of Reinforcement Learning?

Assuming you are already familiar with Machine Learning and at least the basics of Deep Learning, we strongly recommend:

  • Reinforcement Learning: An Introduction - a book by Richard S. Sutton and Andrew G. Barto
  • Neuro-Dynamic Programming by Dimitri P. Bertsekas and John Tsitsiklis

What's hot in Deep Learning right now? Beat the learning curve and read the 2017 Review of GAN Architectures.