Reinforcement Learning (CSE564) Project

Adapted from Ha and Schmidhuber, "World Models", 2018. Refer original project page.

Introduction

We extend the Model Based World Model RL algorithm by updating in the following specifics:

Use a vanilla autoencoder for the vision (V) model. The authors mention that they use a variational autoencoder to constrain and impose a gaussian distribution over the sampled state latents. We hypothesise that removing the gaussian constraint might work better since we do not constrict the state latents to a distribution.
Use the PEPG (Parameter Exploring Policy Gradients) evolutionary algorithm to converge to the global maxima. A particular weakness of the CMA-ES method is that it discards the majority of the solutions in one generation, and keeps only the top n% of solutions. Weak solutions might contain information too that might help in the convergence. Refer this blog for a concise explanation

Running models

To train the autoencoder and variational autoencoder, run the trainvae.py and trainae.py scripts.
- python trainvae.py --log_dir <directory>
- python trainae.py --log_dir <directory>
To train MDN-RNN network using autoencoder and MDN-RNN using variational autoencoder, run the trainmdrnn.py and trainmdrnn_ae.py
- python trainmdrnn.py --log_dir <directory>
- python trainmdrnn_ae.py --log_dir <directory>
To train the controller network using CMA-ES/PEPG with VAE/AE forward passes, train the corresponding files from: traincontroller_cmaes_ae.py, traincontroller_cmaes_vae.py, traincontroller_pepg_ae.py, traincontroller_pepg_vae.py
- python traincontroller_cmaes_ae.py --log_dir <directory> --n-samples <no. of samples> --pop-size <no. of threads> --target-return <expected cumulative reward> --display
- python traincontroller_cmaes_vae.py --log_dir <directory> --n-samples <no. of samples> --pop-size <no. of threads> --target-return <expected cumulative reward> --display
- python traincontroller_pepg_ae.py --log_dir <directory> --n-samples <no. of samples> --pop-size <no. of threads> --target-return <expected cumulative reward> --display
- python traincontroller_pepg_vae.py --log_dir <directory> --n-samples <no. of samples> --pop-size <no. of threads> --target-return <expected cumulative reward> --display

Analysis

VAE Training Loss:

AE Training Loss:

Cumulative sum reward with VAE latents:

Cumulative sum reward with AE latents:

Results

We evaluate the results by measuring the cumulative rewards obtained over the test trajectories.

Encoder Model/Parameter Search Method	Covariance Matrix Adaptation Evolution Strategy (CMA-ES)	Parameter-Exploring Policy Gradients (PEPG)
Variational Autoencoder	74.67 +/- 10.12	60.94 +/- 6.17
Vanilla Autoencoder	47.34 +/- 6.37	20.36 +/- 3.80

Note: Code adapted from here

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data		data
docs		docs
envs		envs
exp_dir		exp_dir
exp_dir_ae		exp_dir_ae
exp_dir_ae_pepg		exp_dir_ae_pepg
exp_dir_vae_pepg		exp_dir_vae_pepg
images		images
models		models
tests		tests
utils		utils
.gitignore		.gitignore
LICENSE.md		LICENSE.md
Model_Based_RL.pdf		Model_Based_RL.pdf
README.md		README.md
es.py		es.py
examine_data.py		examine_data.py
requirements.txt		requirements.txt
test_controller.py		test_controller.py
trainae.py		trainae.py
traincontroller_cmaes_ae .py		traincontroller_cmaes_ae .py
traincontroller_cmaes_vae.py		traincontroller_cmaes_vae.py
traincontroller_pepg_ae.py		traincontroller_pepg_ae.py
traincontroller_pepg_vae.py		traincontroller_pepg_vae.py
trainmdrnn.py		trainmdrnn.py
trainmdrnn_ae.py		trainmdrnn_ae.py
trainvae.py		trainvae.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning (CSE564) Project

Introduction

Running models

Analysis

Results

About

Releases

Packages

Contributors 3

Languages

License

vishaal27/Model_Based_Reinforcement_Learning

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning (CSE564) Project

Introduction

Running models

Analysis

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages