Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency in calculating product of observation/action space shapes #895

Open
songyuc opened this issue Feb 28, 2025 · 0 comments
Open

Comments

@songyuc
Copy link
Contributor

songyuc commented Feb 28, 2025

Hi,

I noticed that there are two slightly different ways to calculate the product of the observation and action space shapes in examples/baselines/sac/sac.py.

Specifically, in the SoftQNetwork class, the following code is used:

nn.Linear(np.array(env.single_observation_space.shape).prod() + np.prod(env.single_action_space.shape), 256),

Here, np.array(env.single_observation_space.shape).prod() is used for the observation space, while np.prod(env.single_action_space.shape) is used for the action space.

It seems that both methods achieve the same result, but the first one (converting the tuple to a NumPy array) is slightly redundant, as np.prod() can directly handle tuples.

I'm curious about the reason for using these two different approaches. Is there a specific reason for this, or would it be beneficial to unify them for consistency?

Thank you for your time and effort in maintaining this project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant