breakable-bottles observation space correction #93

scott-j-johnson · 2024-04-12T05:34:09Z

The breakable-bottles environment's observation space's "bottles_delivered" subspace is a Discrete(2) (so, 0 or 1 values allowed), but on the step when the goal state is reached, the agent has delivered 2 bottles, and an observation with that info is returned. This violates the stated observation space. This PR changes the subspace to be a Discrete(3), and updates the docs and num_observations member accordingly.

This is my first ever Github PR so I apologise if I have done anything incorrect or impolite in this process, and thank you for the project!

…"bottles_delivered" subspace is a Discrete(3), instead of Discrete(2), and updates num_observations and the docs accordingly. On the step when the goal state is reached, the agent has delivered 2 bottles, and an observation with that info is returned, so the previously stated observation space was insufficient. Signed-off-by: Scott Johnson <[email protected]>

ffelten

Hey @scott-j-johnson, thanks a lot for the PR 🤲 !

I have re-read the paper description of the environment (https://www.sciencedirect.com/science/article/pii/S0952197621000336). From there, it seems to me that the current implementation is correct: bottles delivered is 0 or 1, because the episode terminates when we reach 2, hence this state is never encountered.

Maybe I'm missing something though?

scott-j-johnson · 2024-04-12T09:30:57Z

Hi @ffelten , thank you for the response!

The episode does end when two bottles are delivered, but this end state (with two bottles delivered) is also returned from the environment on that same step (ie, terminated=True, bottles_delivered=2). The implementation in MO-Gymnasium is absolutely consistent with the paper, but I think the paper itself is inaccurate here. I spoke with Peter Vamplew briefly about it and I think he agreed that it is an oversight.

However, I was assuming that the observation space of an environment should be accurate for terminal states too. I may be incorrect in that assumption. If it isn't necessary for terminal states to adhere to the observation space, then please ignore this PR, and I apologise for not knowing that.

For more context, I originally noticed this issue because trying to use gymnasium.spaces.utils.flatten on that terminal state was causing a crash (because the state is outside the space).

Thanks again for your response!

ffelten · 2024-04-12T11:08:01Z

Hmmm, this is a nice corner case :).

TBH, I don't know the Farama policy on this regard: ie should terminal states be in the state space? @pseudo-rnd-thoughts has probably a better idea than me on this.

To me, it shouldn't matter as the learning algorithm should never even consider that state. But there might be corner cases with wrappers and kinds of stuff that do trigger exceptions.

pseudo-rnd-thoughts · 2024-04-12T11:22:29Z

Hey, yes the observation space should contain all non-terminal and terminal states.
I'm guessing that this could be an issue for users that flatten the observation to one-hot it, for the terminal state you need to represent the third case.
While not affecting training performance, this might require a version due to the change above but this is very minor so might not be worth it.

LucasAlegre · 2024-04-12T13:09:31Z

Hi @scott-j-johnson, thanks for spotting this! I agree with the change and that this is an oversight in the paper.
Before merging, could you add a comment in the code explaining this situation and why the number of states is actually higher and different from the paper? So people can understand why the definition is a bit different than in the paper when checking the code.

ffelten · 2024-04-12T13:29:19Z

Can you confirm that this fixes your bug too? :-)

Isn't this incorrect though? https://github.com/Farama-Foundation/MO-Gymnasium/pull/93/files#diff-f148056cadad72d35f5a598d2602879f0a55ace14a6bae99d72b3ff1ebd87744R223

…the change in observation space compared to the original paper. Also updated get_obs_idx to use the larger space. Signed-off-by: Scott Johnson <[email protected]>

scott-j-johnson · 2024-04-14T01:37:52Z

@LucasAlegre Sure thing, I've added some additional text to the class description explaining the difference. Please let me know if it's insufficient or should be re-worded (or feel free to change it yourself when merging, if that's possible. I'm not sure exactly how the process works).

@ffelten Ah, good catch! You're right, I missed that part. I've updated it in the latest commit. Thanks for noticing that. And yes, FlattenObservation et al seem to work now.

ffelten · 2024-04-14T02:15:59Z

If everything is better than before I don't see any reason to decline the PR. Thanks @scott-j-johnson

scott-j-johnson · 2024-04-14T10:08:09Z

I'm not sure why the pre-commit check failed on trimming trailing whitespace. I can't see any trailing whitespace on my end. Is there something I should do to fix it?

pseudo-rnd-thoughts · 2024-04-14T12:17:33Z

'precommit run --all-files' in the terminal
If haven't installed precommit then 'pip install precommit' and 'precommit install'

Signed-off-by: Scott Johnson <[email protected]>

ffelten reviewed Apr 12, 2024

View reviewed changes

Added some text in the BreakableBottles class description explaining …

00f33f1

…the change in observation space compared to the original paper. Also updated get_obs_idx to use the larger space. Signed-off-by: Scott Johnson <[email protected]>

Attempted fix for pre-commit whitespace trim failure.

652e910

Signed-off-by: Scott Johnson <[email protected]>

ffelten merged commit 503dbf0 into Farama-Foundation:main Apr 15, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

breakable-bottles observation space correction #93

breakable-bottles observation space correction #93

scott-j-johnson commented Apr 12, 2024

ffelten left a comment •

edited

Loading

scott-j-johnson commented Apr 12, 2024

ffelten commented Apr 12, 2024

pseudo-rnd-thoughts commented Apr 12, 2024

LucasAlegre commented Apr 12, 2024 •

edited

Loading

ffelten commented Apr 12, 2024

scott-j-johnson commented Apr 14, 2024

ffelten commented Apr 14, 2024 •

edited

Loading

scott-j-johnson commented Apr 14, 2024

pseudo-rnd-thoughts commented Apr 14, 2024 •

edited

Loading

breakable-bottles observation space correction #93

breakable-bottles observation space correction #93

Conversation

scott-j-johnson commented Apr 12, 2024

ffelten left a comment • edited Loading

Choose a reason for hiding this comment

scott-j-johnson commented Apr 12, 2024

ffelten commented Apr 12, 2024

pseudo-rnd-thoughts commented Apr 12, 2024

LucasAlegre commented Apr 12, 2024 • edited Loading

ffelten commented Apr 12, 2024

scott-j-johnson commented Apr 14, 2024

ffelten commented Apr 14, 2024 • edited Loading

scott-j-johnson commented Apr 14, 2024

pseudo-rnd-thoughts commented Apr 14, 2024 • edited Loading

ffelten left a comment •

edited

Loading

LucasAlegre commented Apr 12, 2024 •

edited

Loading

ffelten commented Apr 14, 2024 •

edited

Loading

pseudo-rnd-thoughts commented Apr 14, 2024 •

edited

Loading