You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The reward setting in Mujoco is confusing. When the agent steps a fixed distance from the starting point (i.e., 0) (2 or 20, temporarily denoted by symbol d), the agent receives a reward of 1 at each state and step. So, with this reward setup proposed by the authors, it doesn't feel like a sparse reward problem. In addition, in this reward setting proposed by the author, it feels like the agent is encouraged to go out of the circle of radius d (when stepping out of the circle of radius d, the agent can get a reward for every step even if it stands still), whereas the original dense reward setting encourages the agent to go further. So, this modification changes the original mission's intent. Finally, I tried to modify the reward, giving the agent a reward of 1 for every d distance traveled, and I found that this approach did not work.
If other readers have also read this question, please help me to answer my doubts, thank you very much!!!
The text was updated successfully, but these errors were encountered:
The reward setting in Mujoco is confusing. When the agent steps a fixed distance from the starting point (i.e., 0) (2 or 20, temporarily denoted by symbol d), the agent receives a reward of 1 at each state and step. So, with this reward setup proposed by the authors, it doesn't feel like a sparse reward problem. In addition, in this reward setting proposed by the author, it feels like the agent is encouraged to go out of the circle of radius d (when stepping out of the circle of radius d, the agent can get a reward for every step even if it stands still), whereas the original dense reward setting encourages the agent to go further. So, this modification changes the original mission's intent. Finally, I tried to modify the reward, giving the agent a reward of 1 for every d distance traveled, and I found that this approach did not work.
If other readers have also read this question, please help me to answer my doubts, thank you very much!!!
The text was updated successfully, but these errors were encountered: