You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This seems to happen at shutdown in any data pipeline that has NumPy arrays. Here is the full stacktrace:
INFO:absl:Process 0 exiting.
INFO:absl:Processing complete for process with worker_index 0
INFO:absl:Grain pool is exiting.
INFO:absl:Shutting down multiprocessing system.
INFO:absl:Shutting down multiprocessing system.
Exception ignored in: <function SharedMemoryArray.__del__ at 0x7e3b780a8a60>
Traceback (most recent call last):
File "/home/black/micromamba/envs/trainpi/lib/python3.10/site-packages/grain/_src/python/shared_memory_array.py", line 139, in __del__
AttributeError: 'NoneType' object has no attribute 'mmap'
/home/black/micromamba/envs/trainpi/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Even if it's not an actual problem, it's a bit annoying because it overwhelms the logging output when you have many workers.
Hello, a quick question: Has there been any progress on this? I can reproduce this issue on both linux and macOS, and while it's not critical, it is rather unpleasant to look at.
I'm not quite sure what causes this. It seems that something complicated goes wrong during shutdown, commenting out a few assertions (which fail because various classes have turned into None?!) reveals that the SharedMemoryArray._unlink_thread_pool is no longer alive at the time of __del__.
Hey, I looked into this a little more. Some observations:
The example above runs fine if you take care to del the batch after every iteration
The error then still occurs on shutdown if the sampler is not exhausted (for example by settingnum_epochs to something large and then interrupting the main process with CTRL+C)
I think there is some sort of race condition in the shutdown procedure of the GrainPool that leaves orphaned shared memory objects for the batches that were processed by worker threads but not yet consumed by the main thread. I tried to dig into why this happens, but haven't yet found any definite fix or cause. Maybe there just needs to be an atexit handler somewhere that tries to do some cleanup?
It would be nice to fix this, as unclosed shared memory opens the door to memory leaks. Things will probably be cleaned up by the resource tracker, but I think this is not guaranteed on all platforms.
This seems to happen at shutdown in any data pipeline that has NumPy arrays. Here is the full stacktrace:
Even if it's not an actual problem, it's a bit annoying because it overwhelms the logging output when you have many workers.
Here's the simplest possible repro:
The text was updated successfully, but these errors were encountered: