GitHub - onetwothr1/AlphaZero_Gomoku_Renju_Rule: An implementation of the AlphaZero algorithm for Renju rule Gomoku

AlphaZero Gomoku with Renju Rule

An implementation of the AlphaZero algorithm for Gomoku (also called Omok or Gobang) using Renju rule, which has the rule of forbidden moves to limit black's advantage. I trained a model that plays on 9 * 9 board with 3000 self-play games over 5 days.

References:

Example Games Between AIs

Each move with 10 MCTS rollouts:

Each move with 200 MCTS rollouts:

Play

Run following script from the root directory:

python play.py

You can also play this game with AI on Google Colab.

Files

File	Description
game.py	Defines the game board and the game flow.
player.py	Defines player (black and white).
renju_rule.py	Defines Renju rule.
alphazero_net.py	Defiens policy and value network for the AlphaZero model.
alphazero_agent.py	Defines an agent that utilizes a combination of policy and value networks along with MCTS.
encoder.py	Encodes a game board into feature channels.
experience.py	Defines the data structure of self-play experiences and data augmentation.
self_play.py	Generates AI self-play experiences.
train.py	Trains a model using self-play experiences.
compare_performance.py	Simulates games between two agents to determine the superior one.
bot_v_bot.py	Runs a game between AIs.
play.py	Runs a game between human and AI.

Challenges in Training

After 2000 self-play games, the agent showed proficiency in offense but struggled with defense. To address this, I conducted an additional 1000 self-play games and implemented three actions.

Encouraging Defensive Moves: If the agent identified a move leading to the opponent's victory, it was prompted to try that move more, facilitating understanding of defensive strategies.
Adjusting Move Selection Probability: During the tree search process, I recorded the number of predicted losses for each move. Then subtracted this value from the corresponding move's total visit count when saving the probability distribution of possible moves. This adjustment emphasized proper defensive moves during training while reducing the probabilities of less effective moves.
Extending Search for Low Confidence Moves: After Monte Carlo Tree Searching, if the confidence of the selected best move was lower than a threshold, additional searching was performed.

Although these actions significantly improved the agent's defense, there were cases where the agent focused more on defense than offense. To solve this, I extended the first action also to winning scenarios. These modifications quickly improved the agent's offensive-defensive balance without a large number of self-play games and extensive training time.

You can find the detailed algorithm in the select_move() method within the alphazero_agent.py file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AlphaZero Gomoku with Renju Rule

Example Games Between AIs

Play

Files

Challenges in Training

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
experiences		experiences
models		models
README.md		README.md
alphazero_agent.py		alphazero_agent.py
alphazero_net.py		alphazero_net.py
bot_v_bot.py		bot_v_bot.py
compare_performance.py		compare_performance.py
encoder.py		encoder.py
experience.py		experience.py
game.py		game.py
play.ipynb		play.ipynb
play.py		play.py
play_10rollout.gif		play_10rollout.gif
play_200rollout.gif		play_200rollout.gif
player.py		player.py
renju_rule.py		renju_rule.py
self_play.py		self_play.py
train.py		train.py
utils.py		utils.py

onetwothr1/AlphaZero_Gomoku_Renju_Rule

Folders and files

Latest commit

History

Repository files navigation

AlphaZero Gomoku with Renju Rule

Example Games Between AIs

Play

Files

Challenges in Training

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages