Please use hyper parameters from this readme. With other hyper parameters things might not work (it's RL after all)!
Original repository - Link
This is a PyTorch implementation of
- Advantage Actor Critic (A2C), a synchronous deterministic version of A3C
- Proximal Policy Optimization PPO
- Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation ACKTR
- Generative Adversarial Imitation Learning GAIL
- Python 3 (it might work with Python 2, but I didn't test it)
- PyTorch
- OpenAI baselines
In order to install requirements, follow:
# PyTorch
conda install pytorch torchvision -c soumith
# Baselines for Atari preprocessing
git clone https://github.com/openai/baselines.git
cd baselines
pip install -e .
# Other requirements
pip install -r requirements.txt
In order to visualize the results use visualize.ipynb
.
python main.py --env-name "PongNoFrameskip-v4" --use-pnn --use-gae --num-processes 8 --num-steps 128 --num-mini-batch 4 --use-linear-lr-decay
python main.py --env-name "PongNoFrameskip-v4" --use-pnn --n-columns 2 --pnn-paths "path_to_trained_model_from_previous_runs" --use-gae --num-processes 8 --num-steps 128 --num-mini-batch 4 --use-linear-lr-decay
Works with minigrid environments. Pass 'MiniGrid-xyz' (change this to environment's name) as the argument for --env-name