This GAIL implementation is highly correlated to PPO algorithm:
- Expert trajectories generated according to PPO pre-trained model;
- GAIL learn policy utilizing PPO algorithm.
- Generating expert trajectories by expert_trajectory_collector.py (You should pre-train a model by specific RL algorithm);
- Filling in a custom config file for gail, a template is provided in config/config.yml;
- Training GAIL from main.py.
Run the algorithm on BipedalWalker-v3 for continuous control.
Expert trajectories are collected by running PPO, trajectories are saved as .npz
format,
then GAIL
utilizes PPO
algorithm for policy optimization.
The performance (average reward) curve looks like this:
You may see that GAIL
is not as good as PPO
, however for imitating, GAIL
is good.