Learning to Evaluate Go Positions via Temporal Difference Methods

N. N. Schraudolph, P. Dayan, and T. J. Sejnowski. Learning to Evaluate Go Positions via Temporal Difference Methods. In N. Baba and L. C. Jain, editors, Computational Intelligence in Games, Studies in Fuzziness and Soft Computing, pp. 77–98, Springer Verlag, Berlin, 2001.
Earlier version

Download


186.0kB	139.9kB	91.8kB

Abstract

The game of Go has a high branching factor that defeats the tree search approach used in computer chess, and long-range spatiotemporal interactions that make position evaluation extremely difficult. Development of conventional Go programs is hampered by their knowledge-intensive nature. We demonstrate a viable alternative by training neural networks to evaluate Go positions via temporal difference (TD) learning. Our approach is based on neural network architectures that reflect the spatial organization of both input and reinforcement signals on the Go board, and training protocols that provide exposure to competent (though unlabelled) play. These techniques yield far better performance than undifferentiated networks trained by self-play alone. A network with less than 500 weights learned within 3000 games of 9x9 Go a position evaluation function superior to that of a commercial Go program.

BibTeX Entry

@incollection{SchDaySej01,
     author = {Nicol N. Schraudolph and Peter Dayan
               and Terrence J. Sejnowski},
      title = {\href{http://nic.schraudolph.org/pubs/SchDaySej01.pdf}{
               Learning to Evaluate Go Positions
               via Temporal Difference Methods}},
    chapter =  4,
      pages = {77--98},
     editor = {Norio Baba and Lakhmi C. Jain},
  booktitle = {Computational Intelligence in Games},
  publisher = {\href{http://www.springer.de/}{Springer Verlag}, Berlin},
     series = {Studies in Fuzziness and Soft Computing},
     volume =  62,
       year =  2001,
   b2h_type = {Book Chapters},
  b2h_topic = {Reinforcement Learning},
   b2h_note = {<a href="b2hd-SchDaySej94.html">Earlier version</a>},
   abstract = {
    The game of Go has a high branching factor that defeats the tree search
    approach used in computer chess, and long-range spatiotemporal
    interactions that make position evaluation extremely difficult.
    Development of conventional Go programs is hampered by their
    knowledge-intensive nature.  We demonstrate a viable alternative
    by training neural networks to evaluate Go positions via temporal
    difference (TD) learning.
    Our approach is based on neural network architectures that reflect the
    spatial organization of both input and reinforcement signals on the Go
    board, and training protocols that provide exposure to competent (though
    unlabelled) play.  These techniques yield far better performance than
    undifferentiated networks trained by self-play alone.  A network with
    less than 500 weights learned within 3\,000 games of 9x9 Go a position
    evaluation function superior to that of a commercial Go program.
}}