How Does AlphaGo Work? Power of Reinforcement Learning
Nov 3, 2017
Nov 3, 2017
Note: This article was originally published on November 3, 2017 and has been migrated from our previous blog. Some details — tools, libraries, benchmarks, industry context — may be outdated. For our latest perspective, see our recent posts.
In October 2015, AlphaGo, an AI-powered system, beat Mr. Fan Hui, the reigning 3-times European Champion of the complex boardgame Go, by 5 points to 0. It was the first time an AI conquered a human in such a sophisticated game.
Go is an abstract strategy board game for two players invented in ancient China more than 2,500 years ago. Its aim is to use “stones” (black or white) to surround more territory than the opponent.
The rules of the game are very simple. However, the variations of the placements of the stones outnumber the total number of atoms in the visible universe (1080 atoms vs 10761 possible games of Go). Instinct, intuition and feel play an important role in the game, and because of its intellectual beauty, it has captured the attention of civilizations since centuries.
The fact that so many combinations are possible makes the fact that AlphaGo managed to beat the reigning champion of go truly remarkable, and sheds light on the potential opportunities of artificial intelligence in the future.

This win was not a fluke or one-time thing. After the victory with Mr. Fan Hui, AlphaGo played with Mr. Lee Sedol, who is the winner of 18 world titles and recognized as the greatest Go player of the decade, in Seoul, South Korea. The result? 4 to 1 for the machine.
These landslide victories shocked professional associations worldwide, and AlphaGo was granted an honorary 9-dan ranking (the highest possible certification) – a title previously reserved only for the world’s top Go Champions.

Chess has many similarities to Go. Both stem from ancient times and are played by two people taking turns. Also, in both games, there is no random element involved.
But – unlike Go – Chess has been easy for AI machines to master. Champion (Garry Kasparov) was defeated by a software program called Deep Blue more than two decades ago in 1977. The program was carefully designed with collaboration between software engineers and chess grandmasters, and they created an extremely efficient machine.On the other hand, it took programmers around 20 years to create intelligence that could defeat the Go world champion.
This is because of a profound difference in complexity between these games. In Go, during every turn, a player has a far larger quantity of moves to chose from (about 250 in Go vs. 35 in chess). You can imagine the intelligence that a machine needs to have in order to master a game like Go. Let’s discuss how they did it.
Have you ever trained a dog to teach him a new trick? What does that process could look like? Most likely, you give him a reward if the trick is performed correctly, and punish him or ignore him if it’s not.
For example, if every time a dog messes up a living room, you take away some of his favorite food, or every time he returns with a stick that you’ve thrown out, you give him a sugar cube, what will the dog eventually learn?
The dog, or the agent, will behave in a way to maximize the amount of his favorite food you give him as a feedback. His biological “neural network” will invent the correct patterns itself so that he can achieve the maximum positive result. This type of learning is a very powerful tool, and it’s ingrained in nearly every biological unit.
The above example is a high-level real-world example of reinforcement learning. It constantly relies on feedback from the environment. The machine can be rewarded or punished based on the current state of the actions it performed, or based on how fast it’s able to reach the desired state from a current position. The longer the distance (be it a physical distance, a fraction of time or just an abstract distance), the reward can be discounted by a certain factor.
One of the recent, spectacular achievements related to RL was when OpenAI built a RL system in 2 weeks that beats world-class players in Dota 2. The system was not taught the rules of Dota, nor does it contain any hardcoded game-based rules. It started by randomly traversing a map or standing still and slowly improved itself, baby-stepping its way. In other words: it played more games than any human was ever capable of playing, and it learned this way.
There are many areas where Reinforcement Learning is applied. Just to name a few:
Reinforcement Learning has found multiple uses in business. Practical Machine Learning is another type of AI learning that has become useful in business. For more information, check out our article? Read our article!
Finally – it was successfully used to play Atari games (e.g. PacMan, Tetris). It quickly gained the skills and outperformed human experts on three of the games described in the experiment.
Assuming you are already familiar with Machine Learning and at least the basics of Deep Learning, we strongly recommend:
What’s hot in Deep Learning right now? Beat the learning curve and read the 2017 Review of GAN Architectures.
Say hello to optimized workflows and enhanced decision-making! Harness the power of AI to unlock differentiated insights
Harness the power of AI - Whether it’s optimizing supply chains in logistics, preventing fraud in healthcare insurance, or leveraging advanced social listening to enhance your portfolio companies.