您现在的位置是:网站首页> 内容页

AlphaZero并行五子棋AI

  • 大宝游戏官网
  • 2019-07-03
  • 360人已阅读
简介AlphaZero-Gomoku-MPILinkGithub:AlphaZero-Gomoku-MPIOverviewThisrepoisbasedonjunxiaoso

AlphaZero-Gomoku-MPI

Link

Github : AlphaZero-Gomoku-MPI

Overview

This repo is based on junxiaosong/AlphaZero_Gomoku sincerely grateful for it.

I do these things:

Implement asynchronous self-play training pipeline in parallel like AlphaGo Zero"s wayWrite a root parallel mcts (vote a move using ensemble way)Use ResNet structure to train the model and set a transfer learning API to train a larger board model based on small board"s model (like pre-training way in order to save time)

Strength

Current model is on 11x11 board and playout 400 times when testPlay with this model can always win regardless of black or whitePlay with gomocup"s AI can rank around 20th-30th for some rough testsWhen I play white I can"t win AI. When I play black end up with tie/lose for most of my time

References

Mastering the game of Go without human knowledgeA general reinforcement learning algorithm that masters chess shogi and Go through self-playParallel Monte-Carlo Tree Search

Blog

deepmind blogmpi4py blog -- author: 自可乐

Installation Dependencies

Python3tensorflow>=1.8.0tensorlayer>=1.8.5mpi4py (parallel train and play)pygame (GUI)

How to Install

tensorflow/tensorlayer/pygame install :

conda install tensorflowconda install tensorlayerconda install pygame

mpi4py install click here

mpi4py on windows click here

How to Run

Play with AI

python human_play.pyPlay with parallel AI (-np : set number of processings take care of OOM !)

mpiexec -np 3 python -u human_play_mpi.py Train from scratch

python train.pyTrain in parallel

mpiexec -np 43 python -u train_mpi.py

Algorithm

It"s almost no difference between AlphaGo Zero except APV-MCTS.A PPT can be found in dir demo/slides

Details

Most settings are the same with AlphaGo Zero details as follow :

Network StructureCurrent model uses 19 residual blocks more blocks means more accurate prediction but also slower speedThe number of filters in convolutional layer shows in the follow pictureFeature PlanesIn AlphaGo Zero paper there are 19 feature planes: 8 for current player"s stones 8 for opponent"s stones and the final feature plane represents the colour to playHere I only use 4 for each player it can be easily changed in game_board.pyDirichlet NoiseI add dirichlet noises in each node it"s different from paper that only add noises in root node. I guess AlphaGo Zero discard the whole tree after each move and rebuild a new tree while here I keep the nodes under the chosen action it"s a little differentWeights between prior probabilities and noises are not changed here (0.75/0.25) though I think maybe 0.8/0.2 or even 0.9/0.1 is better because noises are added in every nodeparameters in detail

I try to maintain the original parameters in AlphaGo Zero paper so as to testify it"s generalization. Besides I also take training time and computer configuration into consideration.

Parameters SettingGomokuAlphaGo Zero
MPI num43-
c_puct55
n_playout4001600
blocks1919/39
buffer size500000(data)500000(games)
batch_size5122048
lr0.001annealed
optimizerAdamSGD with momentum
dirichlet noise0.30.03
weight of noise0.250.25
first n move1230
Training detialsI train the model for about 100000 games and takes 800 hours or soComputer configuration : 2 CPU and 2 1080ti GPUWe can easily find the computation gap with DeepMind and rich people can do some future work

Some Tips

NetworkZeroPadding with Input : Sometimes when play with AI it"s unaware of the risk at the edge of board even though I"m three/four in a row. ZeroPadding data input can mitigate the problemPut the network on GPU : If the network is shallow it"s not matter CPU/GPU to use otherwise it"s faster to use GPU when self-playDirichlet NoiseAdd Noise in Node : In junxiaosong/AlphaZero_Gomoku noises are added outside the tree seemingly like DQN"s (epsilon-greedy) way. It"s ok when I test on 6x6 and 8x8 board but when on 11x11 some problems occur. After a long time training on 11x11 black player will always play the first stone in the middle place with policy probability equal to 1. It"s very rational for black to play here however the white player will never see other kifu that play in the other place at first stone. So when I play black with AI and place somewhere not the middle place AI will get very stupid because it has never seen this way at all. Add noise in node can mitigate the problemSmaller Weight with Noise : As I said before I think maybe 0.8/0.2 or even 0.9/0.1 is a better choice between prior probabilities and noises" weights because noises are added in every nodeRandomnessDihedral Reflection or Rotation : When use the network to output probabilities/value it"s better to do as paper said: The leaf node (s_L) is added to a queue for neural network evaluation ((d_i(p)v)=f_{heta}(d_i(s_L))) where (d_i) is a dihedral reflection or rotation selected uniformly at random from (i) in ([1..8])Add Randomness when Test : I add the dihedral reflection or rotation also when play with it so as to avoid to play the same game all the timeTradeoffsNetwork Depth : If the network is too shallow loss will increase. If too deep it"s slow when train and test. (My network is still a little slow when play with it I think maybe 9 blocks is all right)Buffer Size : If the size is small it"s easy to fit by network but can"t guarantee it"s performance for only learning from these few data. If it"s too large much longer time and deeper network structure should be takenPlayout Number : If small it"s quick to finish a self-play game but can"t guarantee kifu"s quality. On the contrary with more playout times better kifu will get but also take longer time

Future Work Can Try

Continue to train (a larger board) and increase the playout numberTry some other parameters for better performanceAlter network structureAlter feature planesImplement APV-MCTSTrain on standard/renju rule

文章评论

Top