Tic-tac-toe (American English), noughts and crosses (British English), or Xs and Os is a paper-and-pencil game for two players, X and O, who take turns marking the spaces in a 3×3 grid. The player who succeeds in placing three of their marks in a horizontal, vertical, or diagonal row wins the game. (Wiki)
In practice, tabular Q learning is a better method to train a tic-tac-toe agent due to the limited number of states and the stablity of tabular Q learning. However, this repo aims to build a deep Q learning agent for this game.
Required package:
- numpy
- keras
You may also install them by python install -r requirements.txt
Before training the model, you may open train_rl.py
to change model config. Then simply python train_rl.py
to start the training.
player_name
: str, feel free to make a cool name :)batch_size
: int, batch sizelearning_rate
: float, learning rateini_epsilon
: float, initial epsilonepsilon_decay
: float, every episode the current epsilon will time this factorepsilon_min
: float, minimum epsilongamma
: float, reward discounthidden_layers_size
: list of int, only relevent ifload_trained_model_path
is Noneis_double_dqns
: bool, whether to use double dqnoptimizer
: anything that is a Keras optimizer e.g. keras.optimizers.Adam(lr=0.0001)loss
: anything that is a Keras lossload_trained_model_path
: str or Noneis_train
: set it to be true when trainingp2_player_type
: str, 'random' or 'q_player'p2_load_trained_model_path
: Ifp2_player_type = 'random
, set it to None, else set it a str (saved model path)episode
: int, episodememory_size
: int, replay memory sizeepisode_switch_q_target
: int, every a number of episode, copy q_value model parameters to q_targetis_special_sample
: bool, if True, focus on sampling terminal state when deciding training batchsave_model_path
: str, path to save the final trained modelwin_reward
: floatlose_reward
: floatdraw_reward
: float