Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

breakable-bottles observation space correction #93

Merged

Conversation

scott-j-johnson
Copy link
Contributor

The breakable-bottles environment's observation space's "bottles_delivered" subspace is a Discrete(2) (so, 0 or 1 values allowed), but on the step when the goal state is reached, the agent has delivered 2 bottles, and an observation with that info is returned. This violates the stated observation space. This PR changes the subspace to be a Discrete(3), and updates the docs and num_observations member accordingly.

This is my first ever Github PR so I apologise if I have done anything incorrect or impolite in this process, and thank you for the project!

…"bottles_delivered" subspace is a Discrete(3), instead of Discrete(2), and updates num_observations and the docs accordingly. On the step when the goal state is reached, the agent has delivered 2 bottles, and an observation with that info is returned, so the previously stated observation space was insufficient.

Signed-off-by: Scott Johnson <[email protected]>
Copy link
Collaborator

@ffelten ffelten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @scott-j-johnson, thanks a lot for the PR 🤲 !

I have re-read the paper description of the environment (https://www.sciencedirect.com/science/article/pii/S0952197621000336). From there, it seems to me that the current implementation is correct: bottles delivered is 0 or 1, because the episode terminates when we reach 2, hence this state is never encountered.

Maybe I'm missing something though?

image

@scott-j-johnson
Copy link
Contributor Author

Hi @ffelten , thank you for the response!

The episode does end when two bottles are delivered, but this end state (with two bottles delivered) is also returned from the environment on that same step (ie, terminated=True, bottles_delivered=2). The implementation in MO-Gymnasium is absolutely consistent with the paper, but I think the paper itself is inaccurate here. I spoke with Peter Vamplew briefly about it and I think he agreed that it is an oversight.

However, I was assuming that the observation space of an environment should be accurate for terminal states too. I may be incorrect in that assumption. If it isn't necessary for terminal states to adhere to the observation space, then please ignore this PR, and I apologise for not knowing that.

For more context, I originally noticed this issue because trying to use gymnasium.spaces.utils.flatten on that terminal state was causing a crash (because the state is outside the space).

Thanks again for your response!

@ffelten
Copy link
Collaborator

ffelten commented Apr 12, 2024

Hmmm, this is a nice corner case :).

TBH, I don't know the Farama policy on this regard: ie should terminal states be in the state space? @pseudo-rnd-thoughts has probably a better idea than me on this.

To me, it shouldn't matter as the learning algorithm should never even consider that state. But there might be corner cases with wrappers and kinds of stuff that do trigger exceptions.

@pseudo-rnd-thoughts
Copy link
Member

Hey, yes the observation space should contain all non-terminal and terminal states.
I'm guessing that this could be an issue for users that flatten the observation to one-hot it, for the terminal state you need to represent the third case.
While not affecting training performance, this might require a version due to the change above but this is very minor so might not be worth it.

@LucasAlegre
Copy link
Member

LucasAlegre commented Apr 12, 2024

Hi @scott-j-johnson, thanks for spotting this! I agree with the change and that this is an oversight in the paper.
Before merging, could you add a comment in the code explaining this situation and why the number of states is actually higher and different from the paper? So people can understand why the definition is a bit different than in the paper when checking the code.

@ffelten
Copy link
Collaborator

ffelten commented Apr 12, 2024

Can you confirm that this fixes your bug too? :-)

Isn't this incorrect though? https://github.com/Farama-Foundation/MO-Gymnasium/pull/93/files#diff-f148056cadad72d35f5a598d2602879f0a55ace14a6bae99d72b3ff1ebd87744R223

…the change in observation space compared to the original paper. Also updated get_obs_idx to use the larger space.

Signed-off-by: Scott Johnson <[email protected]>
@scott-j-johnson
Copy link
Contributor Author

@LucasAlegre Sure thing, I've added some additional text to the class description explaining the difference. Please let me know if it's insufficient or should be re-worded (or feel free to change it yourself when merging, if that's possible. I'm not sure exactly how the process works).

@ffelten Ah, good catch! You're right, I missed that part. I've updated it in the latest commit. Thanks for noticing that. And yes, FlattenObservation et al seem to work now.

@ffelten
Copy link
Collaborator

ffelten commented Apr 14, 2024

If everything is better than before I don't see any reason to decline the PR. Thanks @scott-j-johnson

@scott-j-johnson
Copy link
Contributor Author

I'm not sure why the pre-commit check failed on trimming trailing whitespace. I can't see any trailing whitespace on my end. Is there something I should do to fix it?

@pseudo-rnd-thoughts
Copy link
Member

pseudo-rnd-thoughts commented Apr 14, 2024

'precommit run --all-files' in the terminal
If haven't installed precommit then 'pip install precommit' and 'precommit install'

@ffelten ffelten merged commit 503dbf0 into Farama-Foundation:main Apr 15, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants