Reinforcement Learning - OpenAI gym

This repository contains the solutions for the OpenAI gym environments using different Deep Reinforcement Learning algorithms.
Objective: For the default OpenAI Gym environments, their goals are to achieve a certain average threshold reward value for a consecutive number of trials (eposides) as available here. For the environments other than that provided by the OpenAI Gym, their goal reward is set to 0 and number of trials to 1 by default.

Conda installation

Install Conda
Clone this repository (let's say to ${SRC_DIR})
Create and activate conda environment with following command

cd ${SRC_DIR}  
conda env create -f environment.yml    
conda activate openai_gym

Supported RL environments

OpenAI gym
- Classic control
- Box2D (except CarRacing-v0 which consumes all memory as its state space consists of 96x96 RGB images)
- Mujoco (needs activation as described here)
- Robotics
PyBullet
Highway-env

Implemented RL algorithms

Following model-free Deep RL algorithms are available:

Off-Policy	On-Policy
DQN	SARSA
DDQN	REINFORCE
DDPG	A2C
SAC

Usage:

Training the agent

Create a YAML config file (let's say sarsa_cartpole.yaml)

CartPole-v1:
  env_name: CartPole-v1
  epochs: 1000
  render: False
  record_interval: 10
  summary_dir: summaries/classic_control
  algo:
    name: sarsa
    kwargs:
      clip_norm: 5.0
      num_gradient_steps: 2
      gamma_kwargs:
        type: ConstantScheduler
        value: 0.9
      lr_kwargs:
        type: ConstantScheduler
        value: 0.0003
  policy:
    name: greedy_epsilon
    kwargs:
      eps_kwargs:
        type: ExpScheduler      # y = exp(-decay_rate*t) where t = epoch
        decay_rate: 0.01
        update_step: 20
        clip_range: [0.001, 0.6]

The valid parameters for the YAML config file are as follows:
- env_name: (str) Name of the OpenAI gym / PyBullet environment
- epochs: (int) Number of training epochs. Defaults to 1000.
- render: (bool) When set to True, renders each epoch on the display. Defaults to False.
- record_interval : (int) Interval (in terms of epoch) to record and save the given epoch as mp4 video. Defaults to 10.
  
  For some envrionments, recording videos also renders the recorded epochs.
- load_model: (str) To resume a training, assign a path to the directory with a pretrained model saved as a checkpoint. Defaults to None.
- include: (list[str]) List of additional Gym environment modules required to be imported to load an environment. For instance, for the PyBullet environments, include = [pybullet_envs] will import pybullet_envs before loading the environment. Defaults to None.
- algo:
  - name: (str) Name of one of the supported algorithms from here in snake_case naming convention.
  - kwargs : (dict) Arguments of the given algorithm as key-value pairs. Supported arguments for each algorithm can be found here.
- policy:
  - name: (str) Name of one of the supported policies from here in snake_case naming convention.
  - kwargs: (dict) Arguments of the given policy as key-value pairs. Supported arguments for each policy can be found here.
Enter the following command:

python train.py sarsa_cartpole.yaml

The above command will train the agent on the CartPole-v1 environments using the SARSA algorithm with Greedy Epsilon policy.

Summary information

Track the summary in real-time with tensorboard using the command.

tensorboard --host localhost --logdir ${summary_dir}

The respective summary directory contains following files and directories:

model: tensorboard summary and the best trained model (as checkpoints).
videos: recorded videos if --record_interval argument was passed while training or testing the model
policy.tf: best trained policy model
goal_info.yaml: YAML file with given goal information: number of goals achieved, epoch and reward values for the first goal and max reward.
config.yaml: YAML file with parameters used to train the given model.

Testing the agent

To test the agent previously trained on the CartPole-v1 using the SARSA algorithm, enter the following command:

python test.py --env_name CartPole-v1 --load_model ${path/to/policy.tf} --epochs 10

Here the ${path/to/policy.tf} is the path to the policy.tf model located in the summary directory of the previous experiment.

Name		Name	Last commit message	Last commit date
Latest commit History 242 Commits
assets		assets
configs		configs
models		models
notebooks		notebooks
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
app.py		app.py
environment-gpu.yml		environment-gpu.yml
environment.yml		environment.yml
generate_requirements.py		generate_requirements.py
gym_wiki_info.py		gym_wiki_info.py
packages.txt		packages.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning - OpenAI gym

Conda installation

Supported RL environments

Implemented RL algorithms

Usage:

Training the agent

Summary information

Testing the agent

About

Contributors 2

Languages

RajK853/RL_OpenAI_Gym

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning - OpenAI gym

Conda installation

Supported RL environments

Implemented RL algorithms

Usage:

Training the agent

Summary information

Testing the agent

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2

Languages