-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiments on Mania's dataset #28
base: main
Are you sure you want to change the base?
Conversation
According to VIPER, simply training a likelihood-based model as opposed to an adversarial model may improve performance. https://arxiv.org/abs/2305.14343 |
The results of experiment are actually somewhat confusing. Refer to this run: the demo logit is high (about +5) while the agent logit is low (-15); we would expect that agent receives essentially no reward from this, and yet the average reward is quite good. Edit: I think I understand why now, the In this light, the reason why policy isn't learning, may be that the reward provided by discriminator is 'too sparse'. So we should actually decrease the temperature |
Summary of changes:
sigma
to be learnable by default (I may end up reverting this)