-
-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WorkerProcess
leaks environment variables to parent process
#6749
Comments
@pentschev for those not familiar, what's the actual impact of As you mention, it's difficult to do this properly. Just trying to understand the tradeoff right now being |
This will overwrite what the user specifies for the Dask client. For example, one may launch something like this: $ CUDA_VISIBLE_DEVICES=1 python
>>> from dask_cuda import LocalCUDACluster
>>> cluster = LocalCUDACluster(CUDA_VISIBLE_DEVICES=[0,1])
>>> client = Client(cluster) # This is expected to run on CUDA_VISIBLE_DEVICES=1 as set when launching the process but is now random The problem in the above is we now leave a non-deterministic and impossible to ensure behavior for the user, unless the user overwrites that before launching the Because this will definitely break some users' code in the wild, I'm inclined to say this is a critical issue, especially given we're approaching RAPIDS code freeze and release as well and not certain whether we do have time to wait for another Dask release. For the pytests I can temporarily work around the issue with a fixture, I might be able to do so with I'm also interested in hearing the opinion of folks about the lock idea, that is perhaps not too difficult to implement. |
I played a bit with a (hacky) lock implementation, and it seems like the overall time increases dramatically for Dask-CUDA, from ~5 seconds up to 22 seconds on a machine with 8 workers (i.e., GPUs): Without LockIn [1]: from dask_cuda import LocalCUDACluster
In [2]: %time cluster = LocalCUDACluster()
2022-07-20 13:35:28,329 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 13:35:28,329 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 13:35:28,342 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 13:35:28,342 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 13:35:28,343 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 13:35:28,343 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 13:35:28,347 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 13:35:28,347 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 13:35:28,364 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 13:35:28,364 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 13:35:28,365 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 13:35:28,365 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 13:35:28,367 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 13:35:28,367 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 13:35:28,369 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 13:35:28,369 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
CPU times: user 409 ms, sys: 170 ms, total: 579 ms
Wall time: 5.05 s With LockIn [1]: from dask_cuda import LocalCUDACluster
In [2]: %time cluster = LocalCUDACluster()
2022-07-20 13:35:49,058 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 13:35:49,058 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 13:35:52,016 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 13:35:52,016 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 13:35:54,712 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 13:35:54,712 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 13:35:57,360 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 13:35:57,360 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 13:36:00,034 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 13:36:00,034 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 13:36:02,640 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 13:36:02,640 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 13:36:05,372 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 13:36:05,372 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 13:36:08,073 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 13:36:08,074 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
CPU times: user 19.7 s, sys: 1.95 s, total: 21.6 s
Wall time: 22.2 s With the above, it's unlikely this idea is feasible in practice as well. I'm not really sure what else could be done, perhaps we have to store |
@pentschev the lock doesn't seem like a great idea to me. I'd probably look into doing a double-fork/double-spawn (maybe ideally fork, then spawn, for speed?), and setting the vars in the first process, then using the second process as the actual worker. (There are some fun nuances there with orphaned processes in POSIX as well). Unfortunately I don't think multiprocessing has any hooks that let you run some code after the Either way though:
|
I've gone ahead and surfaced this issue in the release planning issue ( dask/community#263 (comment) ) |
Thanks John, I'm still trying to get more confirmations from folks whether this is a critical issue or not. In the meantime I'm trying to work around it in rapidsai/dask-cuda#955, but unsuccessfully so far. If that's something we can't really fix, maybe it would be best to revert #6681 so we can buy a bit more time to find a proper fix or alternative. |
Wonder if we can just add an option to the config to enable/disable setting environment variables before process spawning and check for it here. Alternatively we could have a special set of environment variables that we only set this way (as opposed to treating them all this way) distributed/distributed/nanny.py Lines 674 to 676 in add3663
|
I think I was now able to work around that in rapidsai/dask-cuda#955 . I'll just wait for confirmation until tomorrow morning, but unless some other problem emerges regarding that we should be fine with the release going out as is. |
Thanks for the update Peter! 🙏 |
@pentschev I've also opened #6777 to revert. As I said over there, I'm kind of inclined to revert even though you have a fix—it just seems like bad behavior, and a weird thing to have to work around. |
I'm against the revert. The amount of users benefiting from the automatic MALLOC_TRIM_THRESHOLD_ vastly, vastly outnumbers the number of users that need different env variables in different workers. I think the ugly but easy way forward is to have two dicts of env variables, one set before fork and another set afterwards. I understand this would this fix the op issue, relying on the fact that the dask-cuda worker first calls |
Only for completeness here, I left my two cents regarding this particular claim in #6777 (comment) .
This causes a race condition if you have multiple workers starting up at the same time, so you would still need to do it in a central place, like |
Actually, let me clarify a bit this comment. The ideal fix would be to avoid the race condition entirely, which would require some sort of lock as I mentioned trying out in #6749 (comment), but that dramatically increases cluster start time. The workaround in rapidsai/dask-cuda#955 only reverts the leaked |
Let me rephrase my suggestion. I apologise in advance for the lack of examples and links, but I'm writing this from my phone in a hotel room. There would be two dicts in distributed.yaml,
|
No need to apologize, I appreciate you taking the time to discuss this issue. I would be fine with the proposed solution provided it no leakage will occur, which seems feasible. If we can demonstrate this not to leak environment variables, that would be awesome. However, I'm not sure if there's time to get that in before the release, so if there's no objection it would be great to merge #6777 and revert the changes so that the release can work smoothly and then work on this new proposal. Is that a reasonable tradeoff? |
To be clear, I'm definitely not suggesting reverting and leaving it reverted. I was only suggesting reverting for today to make the release, then in the next week or two adding a different solution we're all happy with (like @crusaderky's proposal). It feels like a safer path to me, since we know it won't break things for other users using env vars in similar ways, even though it would delay getting |
Appreciate everyone taking the time to follow up here 🙂
Indeed this is one of the alternatives I was looking at above ( #6749 (comment) ). Another would just be a flag to enable/disable pre-spawn env var setting. No strong feelings on either Back to Gabe's point though, we are releasing today. So unless this is getting fixed today. It makes sense to revert and try again post-release. Having been on the other side of this before can totally appreciate this can be frustrating. Though I think everyone here is interested in finding a better solution we all agree on in the medium term 👍 |
Readding this change is now tracked in issue ( #6780 ) |
Since #6681,
WorkerProcess
leaks the environment specified via theenv
kwarg, for example theCUDA_VISIBLE_DEVICES
variable we use in Dask-CUDA.Before https://github.com//pull/6681
After https://github.com//pull/6681
What happens now is that
os.environ.updated(self.env)
is called from the parent process and never reverted. One of the issues this causes is leaking environment variables between pytests. Furthermore, if multiple workers are created they may overwrite each other's variables (I'm not sure if a cluster can createWorkerProcess
es with different environment variables, so this may be a non-issue).This problem has been discussed in length in the past in #3682, which is a difficult problem to tackle from Python given any newly-spawned process must inherit environment variables from the parent process. One of the suggestions in #3682 (comment) was to create a lock to ensure multiple workers don't spawn simultaneously, which will likely increase a bit the spawn time but seems to be the only safe option in that situation.
Any thoughts here @crusaderky (original author of #6681)?
cc'ing @quasiben @kkraus14 @mrocklin for vis as well, who were active on the #3682 discussion.
The text was updated successfully, but these errors were encountered: