Skip to content

Latest commit

 

History

History
81 lines (68 loc) · 4.39 KB

README.md

File metadata and controls

81 lines (68 loc) · 4.39 KB

OPOLO: Off-Policy Learning from Observations

Research code to accompany the paper: Off-Policy Imitation Learning from Observations.

Supported Algorithms:


Installation:

All code is built on the stable-baseline framework.

Prerequisites
  • Python(>=3.5), Cmake, and OpenMPI.
    • Please install prerequisite by following this guideline.
  • Mujoco:
    • Please follow this official instruction.

Install using pip:

cd opolo
pip install -e .

Training OPOLO:

  • Example: run OPOLO on the HalfCheetah-v2 task, using 4 expert trajectories, seed = 1:
cd opolo-code/opolo-baselines/run
python train_agent.py --env HalfCheetah-v2 --seed 1 --algo opolo  --task td3-opolo-idm-decay-reg --n-episodes 4 --log-dir your/absolute/log/path  --n-timesteps -1  
  • The task tag must contain strings of idm, decay, and reg:
    • idm: use inverse-action model.
    • reg: use forward KL-divergence as regularization.
    • decay: reduce the effects of the regularization over time.

Training Other Baselines:

  • Run DAC on the Hopper-v2 task, using 4 expert trajectories, seed = 3:
cd opolo-code/opolo-baselines/run
python train_agent.py --env Hopper-v2 --seed 3 --algo td3dac --log-dir your/absolute/log/path --task td3-dac --n-timesteps -1  --n-episodes 4 
  • Run DACfO on the Walker2d-v2 task, using 10 expert trajectories, seed = 1:
cd opolo-code/opolo-baselines/run
python train_agent.py --env Walker2d-v2 --seed 1 --algo td3dacfo --log-dir your/absolute/log/path --task td3-dacfo --n-timesteps -1 --n-episodes 10
  • Run BCO on the Swimmer-v2 task, using 4 expert trajectories, seed = 1:
cd opolo-code/opolo-baselines/run
python train_agent.py --env Swimmer-v2 --seed 1 --algo td3bco --log-dir your/absolute/log/path --task td3-bco --n-timesteps -1 --n-episodes 4
  • Run GAIL on the Swimmer-v2 task, using 4 expert trajectories, seed = 1:
cd opolo-code/opolo-baselines/run
python train_agent.py --env Swimmer-v2 --seed 1 --algo trpogail --log-dir your/absolute/log/path --task trpo-gail --n-timesteps -1 --n-episodes 4
  • Run GAIfO on the Swimmer-v2 task, using 4 expert trajectories, seed = 3:
cd opolo-code/opolo-baselines/run
python train_agent.py --env Swimmer-v2 --seed 3 --algo trpogaifo --log-dir your/absolute/log/path --task trpo-gaifo --n-timesteps -1 --n-episodes 4

Evaluating Models

  • Assuming that you have completed the training of OPOLO on HalfCheetah using the above commands, with task = td3-opolo-idm-decay-reg .
  • Then you can run the following commands to evaluate the model:
cd opolo-code/opolo-baselines/run
python train_agent.py --env HalfCheetah-v2 --seed 1 --algo opolo --log-dir your/absolute/log/path --task eval-td3-opolo-idm-decay-reg  --n-timesteps -1 --n-episodes 4 
  • Commands are same as training, except for the task flag, with task = eval- + {task-used-for-training}.

Reminders:

  • Expert Trajecotries can be found at:
opolo-code/opolo-baselines/expert_logs
  • Hyper-parameter settings can be found at:
opolo-code/opolo-baselines/hyperparams/