-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion of Vector Environment API Breaking Changes #2279
Comments
You might want to check out https://github.com/openai/gym3 |
Can you be more precise about these complaints that warrant breaking the API, in order to get a better understanding of what needs to be done? As far as I know, almost all the issues involving To be specific, there are a number of things that can be improved in my opinion, but very few of them require breaking the API:
The problem with Gym3 is that it uses a different API than Gym ( |
My main complaint is the tuple observation/action spaces. Stable Baseline's vector environment has the observation/space just be the observation space of one of the environments, and assumes that all environments have the same spaces. This seems much more sensible to me than having a tuple observation space because the main use case of a vector environment is to put many near-identical environments together, and applying a single policy to many of them. Having a single observation space makes using the vector environment in this way much easier. Especially when you start writing vector environment wrappers, etc. While I agree that the tuple space is more general, in this case I prefer simplicity over generality. |
In general, I would prefer it if Gym adopted Stable Baselines vector environment API. |
I would like to second @benblack769: The design of the action space of the vectorized env is cleaner in SB3 in my opinion, and I would prefer if gym adopted the SB3’ vector environment API. It seems the only utility of the tuple action space is so that you can do The two biggest benefits of gym3 API appears to be:
Regarding the first benefit, you could use SB3’s API as well (see here as an example). Regarding the second benefit, I feel it was set up for more complicated use cases (e.g. distributed training, asynchronous rollouts), so I am inclined to keep it simple and just use SB3’s API. |
I would note that implementing a multi-agent environment is equally simple using SB3's vector env API. For example: https://github.com/PettingZoo-Team/SuperSuit/#parallel-environment-vectorization |
I'm confused, Gym's import gym
env = gym.vector.make('CartPole-v1', num_envs=5)
observations = env.reset()
# array([[-0.04071225, -0.00272771, 0.04585911, 0.03397448],
# [ 0.04807664, -0.00466015, 0.03836329, -0.04897803],
# [ 0.03999503, 0.03953767, -0.02474664, -0.00865895],
# [-0.03199483, 0.01612429, -0.03704762, -0.02390875],
# [ 0.00967298, 0.01544605, 0.04391582, 0.0040252 ]],
# dtype=float32) And this is true for any (arbitrarily nested) observation space, see #1513 for additional examples (in particular, one with a The only instance where it will return a tuple of observations is whenever the observation space is not a standard Gym space (e.g.
As far as I know, Gym's The one difference I can spot is that Gym's Here, there is indeed a question of wether to keep the
I don't understand what you mean by the action space being cleaner? If it is because you can feed a numpy array directly in import gym
import numpy as np
env = gym.vector.make('MountainCarContinuous-v0', num_envs=5)
observations = env.reset()
# Here actions.shape = (5, 1) because the action space
# of MountainCarContinuous-v0 has shape (1,)
actions = np.array([[-1.], [-0.5], [0.], [0.5], [1.]])
observations, rewards, dones, infos = env.step(actions)
This is also available in Gym (and subclasses, e.g. in I feel like a lot of the misunderstanding could be cleared up with proper documentation. |
Compare:
To:
Please reread the above complaints with this in mind. |
Just to emphasize that the diff --git a/gym/vector/vector_env.py b/gym/vector/vector_env.py
index 375826f..7d09815 100644
--- a/gym/vector/vector_env.py
+++ b/gym/vector/vector_env.py
@@ -33,7 +33,7 @@ class VectorEnv(gym.Env):
super(VectorEnv, self).__init__()
self.num_envs = num_envs
self.observation_space = batch_space(observation_space, n=num_envs)
- self.action_space = Tuple((action_space,) * num_envs)
+ self.action_space = batch_space(action_space, n=num_envs)
self.closed = False
self.viewer = None
That was my point in the comparison between Gym and SB3: in SB3's # With the diff above
env = gym.vector.make('MountainCarContinuous-v0', num_envs=5)
print(env.action_space)
# Box(-1.0, 1.0, (5, 1), float32) |
The issue I (or actually @benblack769, who made the changes) had with the Gym |
On the subject of >>> vec_env.action_space
Box(-1.0, 1.0, (1,), float32)
>>> vec_env.observation_space
Box(-1.2000000476837158, 0.6000000238418579, (2,), float32) I agree this is a pretty ugly inconsistency. I see @tristandeleu just opened a PR to address this. |
That's a good point, I agree this is not ideal. Originally I had a key in the |
@tristandeleu you have a valid point. The action with shape That said, jus throwing some nit thoughts about the usability. Currently, I have been writing code like the following with SB3's API: self.actor = nn.Sequential(
layer_init(nn.Linear(np.array(envs.observation_space.shape).prod(), 64)),
nn.Tanh(),
layer_init(nn.Linear(64, 64)),
nn.Tanh(),
layer_init(nn.Linear(64, envs.action_space.n), std=0.01),
) I can do things like Using the new We would need to clarify that the first dimension of |
@vwxyzjn the properties self.actor = nn.Sequential(
layer_init(nn.Linear(flatdim(envs.single_observation_space), 64)),
nn.Tanh(),
layer_init(nn.Linear(64, 64)),
nn.Tanh(),
layer_init(nn.Linear(64, envs.single_action_space.n), std=0.01),
) |
Oh wow, didn't know this existed. Thanks @tristandeleu that really helps! My remaining concern is the wrappers. SB3 already has implemented a list of wrappers for the vectorized environment here. Something like |
There is a One thing to note though is that since |
the main issue with gym3 (in addition to breaking gym api) is that terminal observation are not handled apparently (see discussion in DLR-RM/stable-baselines3#311 (comment)) Apart from that, I don't have much to say about Gym's |
If at all possible, could we try to unite the vectorized environment implementation @araffin @tristandeleu? They are all doing the same thing, and we should try to have a single source of the vectorized env, where non-common utilities can be created via subclassing. |
Both APIs are really close to one another, but have some divergences still ( This is not unusual to see large RL libraries run their own version of vector environments (e.g. pfrl, TF-Agents), or defer that logic to specific samplers (e.g. rlpyt, garage). I completely understand that @araffin wants to have full control over SB3's implementation of vector environments, where they need to make sure nothing breaks on their side with a new version of Gym. After all, we're having a discussion here in a thread named "Vector Environment API Breaking Changes"... The goal of |
@tristandeleu thanks for the well-considered reply. I completely agree with the first step being adding more documentations. Now that I was aware of the I feel another reasobale next step is to add more common utilities such as the |
Regarding this, I just realized that there is this PR #1632 already open, which adds the last observation as a key to info. I have commented here #1632 (comment) on how we can return with this PR the last observation as an object which has the same type and shape as a regular batched observation. |
I have opened a PR for adding an extensive documentation of the current vectorized environments #2327, with many examples. Feedback would be very welcome! |
Here is the documentation for vectorized environments on Github Pages for convenience: https://tristandeleu.github.io/gym/vector/ |
@tristandeleu Thanks, this is helpful to have. We're still steadily working on doing the originally planned Jekyll based website and we'll merge your sphinx docs into that when the time comes. |
@jkterry1 Based on the new roadmap #2524
Can you be more precise with what you think should be changed in the vector API? Since you are not contributing to the discussion here, it is impossible to know exactly what would be the changes you claim are necessary. From what I can tell, all the concerns here have been addressed one way or another, and I wonder how redoing the vector API would help. |
Think on the auto-reset way for envpool. It looks like having more sense. https://envpool.readthedocs.io/en/latest/content/python_interface.html#auto-reset EnvPool enables auto-reset by default. Let’s suppose an environment that has a max_episode_steps = 3. When we call env.step(action) five consecutive times, the following would happen:
So there's no need to including observation into info, and the iteration will flow normaly with returns of step as they should be and returns of reset as they should be. |
Am slightly confused by your comment. What is this meant to be in response to? I believe that Gym already implements an autoreset in the |
Yes but Gym does not return the corresponding observation at the time it returns done=True, instead Gym returns an observation corresponding to a reset() and done=True. That is inconsistent desing and should be avoided, which is what envpool does it correctly. Check steps 4 and 5, there is a clear difference between what Gym is doing and what envpool is doing. gym/gym/vector/async_vector_env.py Line 624 in 98a9293
|
Ok, so this is a proposal about how gym should implement the autoreset functionality in the vector env. |
MM, not sure on having to compute a new action when terminal, done or whatever signals True. I guess perhaps the "collector" or "rollout" part of the experimental code should be in charge of this. Perhaps I am missing something else? |
There are numerous long standing complaints about the vector environment API, to the point where it is the one thing that most people have seemed to think breaking changes are warranted for (which I agree with). I've created this thread to come to a consensus on the solution.
Because whatever finalized changes we come up with here will be very significant to the community and this can only really be done once, I am specifically insisting that the following people agree to whatever breaking changes are made:
Benjamin Black (@benblack769, UMD and PettingZoo)
Antonin Raffin (@araffin, DLR RM and Stable Baselines)
Chris Nota (@cpnota, UMass Amherst and the Autonomous Learning Library)
Costa Huang (@vwxyzjn, Weights and Biases and CleanRL)
Jesse Fareboro (@JesseFarebro, MILA and the Arcade Learning Environment)
The text was updated successfully, but these errors were encountered: