Skip to content

Code implementation for Self-Adaptive Imitation Learning

Notifications You must be signed in to change notification settings

illidanlab/SAIL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Self-Adaptive Imitation Learning (SAIL)

Code Implementations for:

Please be noted that:

Installation:

Please also check the stable-baselines webpage for installing preliminary packages: https://github.com/hill-a/stable-baselines

cd stable_baselines
   pip install -e . 
How to run:
  • Run SAIL on environment HalfCheetah-v2, running seed = 3, using 1 teacher trajecotries:

    cd stable_baselines/run
    python train_sail.py --env HalfCheetah-v2 --seed 3 --algo sail --log-dir your/log/dir/ --task gail-lfd-adaptive-dynamic --n-timesteps -1 --n-episodes 1  
    • Results will be written to your/log/dir/gail-lfd-adaptive/sail/HalfCheetah-v2/rank3/ .
    • Note that: the task option must contain substrings of gail, lfd, adaptive, and dynamic.
      • lfd: learning from teacher demonstrations
      • gail: use adversarial training.
      • adaptive: replace teacher buffer with better trajectories.
      • dynamic: turn off mixture sampling by turning after the student has surpassed the teacher (see details in the paper).
  • Run DAC on environment Hopper-v2, running seed=5, using 4 teacher trajctory:

      cd stable_baselines/run
      python train_sail.py --env Hopper-v2 --seed 5 --algo dac --log-dir your/log/dir/ --task dac-gail --n-timesteps -1  --n-episodes 4  
  • Run GAIL on Swimmer-v2, running seed = 2, using 1 teacher trajectory:

      cd stable_baselines/run
      python train_sail.py --env Swimmer-v2 --seed 2 --algo trpo --log-dir your/log/dir/ --task trpo-gail --n-timesteps -1 --n-episodes 1  
Reminders:
  • Teacher demonstrations are saved at SAIL-code/teacher_dataset/. For each environment, we collected 1, 4, 10 trajectories from a sub-optimal teacher.
  • Hyper-parameters can be found at SAIL-code/stable-baselines/hyperparams/.
    • You may need to tune those parameters to fit your machine.

About

Code implementation for Self-Adaptive Imitation Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published