Mastering the game of Go with deep neural networks and tree search
Venue
Nature, vol. 529 (2016), pp. 484-503
Publication Year
2016
Authors
David Silver, Aja Huang, Christopher J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, Demis Hassabis
BibTeX
Abstract
The game of Go has long been viewed as the most challenging of classic games for
artificial intelligence owing to its enormous search space and the difficulty of
evaluating board positions and moves. Here we introduce a new approach to computer
Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to
select moves. These deep neural networks are trained by a novel combination of
supervised learning from human expert games, and reinforcement learning from games
of self-play. Without any lookahead search, the neural networks play Go at the
level of state-of-the-art Monte Carlo tree search programs that simulate thousands
of random games of self-play. We also introduce a new search algorithm that
combines Monte Carlo simulation with value and policy networks. Using this search
algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go
programs, and defeated the human European Go champion by 5 games to 0. This is the
first time that a computer program has defeated full-sized game of Go, a feat
previously thought to be at least a decade away.
