How does AlphaGo work? Power of Reinforcement Learning

November 3, 2017

In October 2015, AlphaGo, an AI-powered system, beat Mr. Fan Hui, the reigning 3-times European Champion of the complex boardgame Go, by 5 points to 0. It was the first time an AI conquered a human in such a sophisticated game.

The game of Go

Go is an abstract strategy board game for two players invented in ancient China more than 2,500 years ago. Its aim is to use "stones" (black or white) to surround more territory than the opponent.

The rules of the game are very simple. However, the variations of the placements of the stones outnumber the total number of atoms in the visible universe (1080 atoms vs 10761 possible games of Go). Instinct, intuition and feel play an important role in the game, and because of its intellectual beauty, it has captured the attention of civilizations since centuries.

The fact that so many combinations are possible makes the fact that AlphaGo managed to beat the reigning champion of go truly remarkable, and sheds light on the potential opportunities of artificial intelligence in the future.

alphago and reinforcement learning

Was it just an one-time event?

This win was not a fluke or one-time thing. After the victory with Mr. Fan Hui, AlphaGo played with Mr. Lee Sedol, who is the winner of 18 world titles and recognized as the greatest Go player of the decade, in Seoul, South Korea. The result? 4 to 1 for the machine.

These landslide victories shocked professional associations worldwide, and AlphaGo was granted an honorary 9-dan ranking (the highest possible certification) - a title previously reserved only for the world's top Go Champions.

alphago game

What about chess?

Chess has many similarities to Go. Both stem from ancient times and are played by two people taking turns. Also, in both games, there is no random element involved.

But - unlike Go - Chess has been easy for AI machines to master. Champion (Garry Kasparov) was defeated by a software program called Deep Blue more than two decades ago in 1977. The program was carefully designed with collaboration between software engineers and chess grandmasters, and they created an extremely efficient machine.On the other hand, it took programmers around 20 years to create intelligence that could defeat the Go world champion.

This is because of a profound difference in complexity between these games. In Go, during every turn, a player has a far larger quantity of moves to chose from (about 250 in Go vs. 35 in chess). You can imagine the intelligence that a machine needs to have in order to master a game like Go. Let's discuss how they did it.

Deep Reinforcement Learning to the rescue

Have you ever trained a dog to teach him a new trick? What does that process could look like? Most likely, you give him a reward if the trick is performed correctly, and punish him or ignore him if it's not.

For example, if every time a dog messes up a living room, you take away some of his favorite food, or every time he returns with a stick that you've thrown out, you give him a sugar cube, what will the dog eventually learn?

The dog, or the agent, will behave in a way to maximize the amount of his favorite food you give him as a feedback. His biological "neural network" will invent the correct patterns itself so that he can achieve the maximum positive result. This type of learning is a very powerful tool, and it's ingrained in nearly every biological unit.

The above example is a high-level real-world example of reinforcement learning. It constantly relies on feedback from the environment. The machine can be rewarded or punished based on the current state of the actions it performed, or based on how fast it's able to reach the desired state from a current position. The longer the distance (be it a physical distance, a fraction of time or just an abstract distance), the reward can be discounted by a certain factor.

One of the recent, spectacular achievements related to RL was when OpenAI built a RL system in 2 weeks that beats world-class players in Dota 2. The system was not taught the rules of Dota, nor does it contain any hardcoded game-based rules. It started by randomly traversing a map or standing still and slowly improved itself, baby-stepping its way. In other words: it played more games than any human was ever capable of playing, and it learned this way.

Where else is Reinforcement Learning used?

There are many areas where Reinforcement Learning is applied. Just to name a few:

  • In robotics - to efficiently find a combination of electrical signals to steer robotic arms (to perform an action) or legs (to walk)
  • Logic games - apart from Go: Poker, Back-Gammon, Othello/Reversi, Checkers, Solitaire
  • In manufacturing - robots for package transportation or for assembling specific parts of cars
  • In military - among others for logistics and to provide automatic assistance for humans in analyzing the environment before actions
  • In inventory management - to reduce transit time and space utilization, or to optimize dispatching rules
  • Power systems - to predict and minimize transmission losses
  • In the financial sector - for instance in the trading systems to generate alpha or to serve as an assistant to allow traders and analysts to save time
  • Steering an autonomous helicopter

Reinforcement Learning has found multiple uses in business. Practical Machine Learning is another type of AI learning that has become useful in business. For more information, check out our article? Read our article!

Finally - it was successfully used to play Atari games (e.g. PacMan, Tetris). It quickly gained the skills and outperformed human experts on three of the games described in the experiment.

What other materials can I read to learn the basis of Reinforcement Learning?

Assuming you are already familiar with Machine Learning and at least the basics of Deep Learning, we strongly recommend:

  • Reinforcement Learning: An Introduction - a book by Richard S. Sutton and Andrew G. Barto
  • Neuro-Dynamic Programming by Dimitri P. Bertsekas and John Tsitsiklis

What's hot in Deep Learning right now? Beat the learning curve and read the 2017 Review of GAN Architectures.

[email protected]
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram