Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'NoneType' object has no attribute 'mmap' #398

Open
kvablack opened this issue Apr 24, 2024 · 2 comments
Open

AttributeError: 'NoneType' object has no attribute 'mmap' #398

kvablack opened this issue Apr 24, 2024 · 2 comments

Comments

@kvablack
Copy link

This seems to happen at shutdown in any data pipeline that has NumPy arrays. Here is the full stacktrace:

INFO:absl:Process 0 exiting.
INFO:absl:Processing complete for process with worker_index 0
INFO:absl:Grain pool is exiting.
INFO:absl:Shutting down multiprocessing system.
INFO:absl:Shutting down multiprocessing system.
Exception ignored in: <function SharedMemoryArray.__del__ at 0x7e3b780a8a60>
Traceback (most recent call last):
  File "/home/black/micromamba/envs/trainpi/lib/python3.10/site-packages/grain/_src/python/shared_memory_array.py", line 139, in __del__
AttributeError: 'NoneType' object has no attribute 'mmap'
/home/black/micromamba/envs/trainpi/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Even if it's not an actual problem, it's a bit annoying because it overwhelms the logging output when you have many workers.

Here's the simplest possible repro:

import grain.python as grain
import numpy as np
import logging
logging.basicConfig(level=logging.INFO)

if __name__ == "__main__":
    class DataSource:
        def __len__(self):
            return 10

        def __getitem__(self, idx):
            return np.zeros(1)

    source = DataSource()
    sampler = grain.IndexSampler(
        num_records=len(source),
        num_epochs=1,
        shard_options=grain.NoSharding(),
        shuffle=False
    )
    loader = grain.DataLoader(
        data_source=source,
        sampler=sampler,
        worker_count=1,
    )

    for batch in loader:
        pass
@sirmarcel
Copy link

Hello, a quick question: Has there been any progress on this? I can reproduce this issue on both linux and macOS, and while it's not critical, it is rather unpleasant to look at.

I'm not quite sure what causes this. It seems that something complicated goes wrong during shutdown, commenting out a few assertions (which fail because various classes have turned into None?!) reveals that the SharedMemoryArray._unlink_thread_pool is no longer alive at the time of __del__.

@sirmarcel
Copy link

Hey, I looked into this a little more. Some observations:

  • The example above runs fine if you take care to del the batch after every iteration
  • The error then still occurs on shutdown if the sampler is not exhausted (for example by settingnum_epochs to something large and then interrupting the main process with CTRL+C)

I think there is some sort of race condition in the shutdown procedure of the GrainPool that leaves orphaned shared memory objects for the batches that were processed by worker threads but not yet consumed by the main thread. I tried to dig into why this happens, but haven't yet found any definite fix or cause. Maybe there just needs to be an atexit handler somewhere that tries to do some cleanup?

It would be nice to fix this, as unclosed shared memory opens the door to memory leaks. Things will probably be cleaned up by the resource tracker, but I think this is not guaranteed on all platforms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants