We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi!
I have a question related to how the intrinsic rewards are calculated. Why do you use the sum(1) instead of mean(1)?
random-network-distillation-pytorch/agents.py
Line 76 in e383fb9
That would calculate the sum along the 512 output neurons, which is different than calculating the mean along those outputs.
At the original release with tensorflow, they use reduce_mean, and im a little bit confused. https://github.com/openai/random-network-distillation/blob/f75c0f1efa473d5109d487062fd8ed49ddce6634/policies/cnn_gru_policy_dynamics.py#L241
Hope you could clear me, Thank you in advance
The text was updated successfully, but these errors were encountered:
Have you get any idea now? I am also confused here, it is different from calculating the MSE. I am also wander why 2 is divided here, not n like MSE.
Thanks in advance
Sorry, something went wrong.
Could it be that this difference does not matter because we are using reward_rms to normalize the intrinsic rewards ?
No branches or pull requests
Hi!
I have a question related to how the intrinsic rewards are calculated.
Why do you use the sum(1) instead of mean(1)?
random-network-distillation-pytorch/agents.py
Line 76 in e383fb9
That would calculate the sum along the 512 output neurons, which is different than calculating the mean along those outputs.
At the original release with tensorflow, they use reduce_mean, and im a little bit confused.
https://github.com/openai/random-network-distillation/blob/f75c0f1efa473d5109d487062fd8ed49ddce6634/policies/cnn_gru_policy_dynamics.py#L241
Hope you could clear me,
Thank you in advance
The text was updated successfully, but these errors were encountered: