Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiments on Mania's dataset #28

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

Conversation

dtch1997
Copy link
Owner

@dtch1997 dtch1997 commented May 25, 2023

Summary of changes:

  • Make URDF configurable, add Mania's URDF
  • Add discriminator temperature config option
  • Change sigma to be learnable by default (I may end up reverting this)
  • Increase default kP, kD to match simulation

@dtch1997
Copy link
Owner Author

dtch1997 commented May 26, 2023

According to VIPER, simply training a likelihood-based model as opposed to an adversarial model may improve performance. https://arxiv.org/abs/2305.14343
What that would look like for our repo: Instead of predicting binary logit for real / fake, predict a Gaussian distribution and maximize likelihood on dataset examples.

@dtch1997
Copy link
Owner Author

dtch1997 commented May 28, 2023

The results of experiment are actually somewhat confusing. Refer to this run: the demo logit is high (about +5) while the agent logit is low (-15); we would expect that agent receives essentially no reward from this, and yet the average reward is quite good.

Edit: I think I understand why now, the rewards/frame is actually calculating the task reward not the AMP reward. So it doesn't sync.

In this light, the reason why policy isn't learning, may be that the reward provided by discriminator is 'too sparse'. So we should actually decrease the temperature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant