-
-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
breakable-bottles observation space correction #93
breakable-bottles observation space correction #93
Conversation
…"bottles_delivered" subspace is a Discrete(3), instead of Discrete(2), and updates num_observations and the docs accordingly. On the step when the goal state is reached, the agent has delivered 2 bottles, and an observation with that info is returned, so the previously stated observation space was insufficient. Signed-off-by: Scott Johnson <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @scott-j-johnson, thanks a lot for the PR 🤲 !
I have re-read the paper description of the environment (https://www.sciencedirect.com/science/article/pii/S0952197621000336). From there, it seems to me that the current implementation is correct: bottles delivered is 0 or 1, because the episode terminates when we reach 2, hence this state is never encountered.
Maybe I'm missing something though?
Hi @ffelten , thank you for the response! The episode does end when two bottles are delivered, but this end state (with two bottles delivered) is also returned from the environment on that same step (ie, terminated=True, bottles_delivered=2). The implementation in MO-Gymnasium is absolutely consistent with the paper, but I think the paper itself is inaccurate here. I spoke with Peter Vamplew briefly about it and I think he agreed that it is an oversight. However, I was assuming that the observation space of an environment should be accurate for terminal states too. I may be incorrect in that assumption. If it isn't necessary for terminal states to adhere to the observation space, then please ignore this PR, and I apologise for not knowing that. For more context, I originally noticed this issue because trying to use gymnasium.spaces.utils.flatten on that terminal state was causing a crash (because the state is outside the space). Thanks again for your response! |
Hmmm, this is a nice corner case :). TBH, I don't know the Farama policy on this regard: ie should terminal states be in the state space? @pseudo-rnd-thoughts has probably a better idea than me on this. To me, it shouldn't matter as the learning algorithm should never even consider that state. But there might be corner cases with wrappers and kinds of stuff that do trigger exceptions. |
Hey, yes the observation space should contain all non-terminal and terminal states. |
Hi @scott-j-johnson, thanks for spotting this! I agree with the change and that this is an oversight in the paper. |
Can you confirm that this fixes your bug too? :-) Isn't this incorrect though? https://github.com/Farama-Foundation/MO-Gymnasium/pull/93/files#diff-f148056cadad72d35f5a598d2602879f0a55ace14a6bae99d72b3ff1ebd87744R223 |
…the change in observation space compared to the original paper. Also updated get_obs_idx to use the larger space. Signed-off-by: Scott Johnson <[email protected]>
@LucasAlegre Sure thing, I've added some additional text to the class description explaining the difference. Please let me know if it's insufficient or should be re-worded (or feel free to change it yourself when merging, if that's possible. I'm not sure exactly how the process works). @ffelten Ah, good catch! You're right, I missed that part. I've updated it in the latest commit. Thanks for noticing that. And yes, FlattenObservation et al seem to work now. |
If everything is better than before I don't see any reason to decline the PR. Thanks @scott-j-johnson |
I'm not sure why the pre-commit check failed on trimming trailing whitespace. I can't see any trailing whitespace on my end. Is there something I should do to fix it? |
'precommit run --all-files' in the terminal |
Signed-off-by: Scott Johnson <[email protected]>
The breakable-bottles environment's observation space's "bottles_delivered" subspace is a Discrete(2) (so, 0 or 1 values allowed), but on the step when the goal state is reached, the agent has delivered 2 bottles, and an observation with that info is returned. This violates the stated observation space. This PR changes the subspace to be a Discrete(3), and updates the docs and num_observations member accordingly.
This is my first ever Github PR so I apologise if I have done anything incorrect or impolite in this process, and thank you for the project!