Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TEPPO documentation #2161

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ and how to implement new MDPs and new algorithms.
RL2 <user/algo_rl2>
SAC <user/algo_sac>
TD3 <user/algo_td3>
TEPPO <user/algo_teppo>
TRPO <user/algo_trpo>
REINFORCE <user/algo_vpg>

Expand Down
76 changes: 76 additions & 0 deletions docs/user/algo_teppo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Proximal Policy Optimization with Task Embedding (TEPPO)


```eval_rst
.. list-table::
:header-rows: 0
:stub-columns: 1
:widths: auto

* - **Paper**
- Learning Skill Embeddings for Transferable Robot Skills :cite:`hausman2018learning`
* - **Framework(s)**
- .. figure:: ./images/tf.png
:scale: 20%
:class: no-scaled-link

Tensorflow
* - **API Reference**
- `garage.tf.algos.TEPPO <../_autoapi/garage/tf/algos/index.html#garage.tf.algos.TEPPO>`_
* - **Code**
- `garage/tf/algos/te_ppo.py <https://github.com/rlworkgroup/garage/blob/master/src/garage/tf/algos/te_ppo.py>`_
* - **Examples**
- :ref:`te_ppo_metaworld_mt1_push`, :ref:`te_ppo_metaworld_mt10`, :ref:`te_ppo_metaworld_mt50`, :ref:`te_ppo_point`
```

Proximal Policy Optimization Algorithms (PPO) is a family of policy gradient methods which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. TEPPO parameterizes the PPO policy via a shared skill embedding space.

## Default Parameters

```py
discount=0.99,
gae_lambda=0.98,
lr_clip_range=0.01,
max_kl_step=0.01,
policy_ent_coeff=1e-3,
encoder_ent_coeff=1e-3,
inference_ce_coeff=1e-3
```

## Examples

### te_ppo_metaworld_mt1_push

```eval_rst
.. literalinclude:: ../../examples/tf/te_ppo_metaworld_mt1_push.py
```

### te_ppo_metaworld_mt10

```eval_rst
.. literalinclude:: ../../examples/tf/te_ppo_metaworld_mt10.py
```

### te_ppo_metaworld_mt50

```eval_rst
.. literalinclude:: ../../examples/tf/te_ppo_metaworld_mt50.py
```

### te_ppo_point

```eval_rst
.. literalinclude:: ../../examples/tf/te_ppo_point.py
```

## References

```eval_rst
.. bibliography:: references.bib
:style: unsrt
:filter: docname in docnames
```

----

*This page was authored by Nicole Shin Ying Ng ([@nicolengsy](https://github.com/nicolengsy)).*
9 changes: 9 additions & 0 deletions docs/user/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,15 @@ @article{yu2019metaworld
journal={arXiv:1910.10897},
}

@inproceedings{hausman2018learning,
title={Learning an Embedding Space for Transferable Robot Skills},
author={Karol Hausman and Jost Tobias Springenberg and Ziyu Wang and Nicolas Heess and Martin Riedmiller},
booktitle={International Conference on Learning Representations},
year={2018},
journal={},
url={https://openreview.net/forum?id=rk07ZXZRb},
}

@article{lillicrap2015continuous,
title={Continuous control with deep reinforcement learning},
author={Lillicrap, Timothy P and Hunt, Jonathan J and Pritzel, Alexander and Heess, Nicolas and Erez, Tom and Tassa, Yuval and Silver, David and Wierstra, Daan},
Expand Down