- The proposed Algorithm:
- As well as other approaches:
- All algorithms are implemented based on the stable-baseline code framework: https://stable-baselines.readthedocs.io/en/master/
Please also check the stable-baselines webpage for installing preliminary packages: https://github.com/hill-a/stable-baselines
cd stable_baselines
pip install -e .
-
Run SAIL on environment HalfCheetah-v2, running seed = 3, using 1 teacher trajecotries:
cd stable_baselines/run python train_sail.py --env HalfCheetah-v2 --seed 3 --algo sail --log-dir your/log/dir/ --task gail-lfd-adaptive-dynamic --n-timesteps -1 --n-episodes 1
- Results will be written to
your/log/dir/gail-lfd-adaptive/sail/HalfCheetah-v2/rank3/
. - Note that: the
task
option must contain substrings of gail, lfd, adaptive, and dynamic.
- Results will be written to
-
Run DAC on environment Hopper-v2, running seed=5, using 4 teacher trajctory:
cd stable_baselines/run python train_sail.py --env Hopper-v2 --seed 5 --algo dac --log-dir your/log/dir/ --task dac-gail --n-timesteps -1 --n-episodes 4
-
Run GAIL on Swimmer-v2, running seed = 2, using 1 teacher trajectory:
cd stable_baselines/run python train_sail.py --env Swimmer-v2 --seed 2 --algo trpo --log-dir your/log/dir/ --task trpo-gail --n-timesteps -1 --n-episodes 1
- Teacher demonstrations are saved at
SAIL-code/teacher_dataset/
. For each environment, we collected 1, 4, 10 trajectories from a sub-optimal teacher. - Hyper-parameters can be found at
SAIL-code/stable-baselines/hyperparams/
.- You may need to tune those parameters to fit your machine.