AttributeError: 'NoneType' object has no attribute 'mmap' #398

kvablack · 2024-04-24T21:45:21Z

This seems to happen at shutdown in any data pipeline that has NumPy arrays. Here is the full stacktrace:

INFO:absl:Process 0 exiting.
INFO:absl:Processing complete for process with worker_index 0
INFO:absl:Grain pool is exiting.
INFO:absl:Shutting down multiprocessing system.
INFO:absl:Shutting down multiprocessing system.
Exception ignored in: <function SharedMemoryArray.__del__ at 0x7e3b780a8a60>
Traceback (most recent call last):
  File "/home/black/micromamba/envs/trainpi/lib/python3.10/site-packages/grain/_src/python/shared_memory_array.py", line 139, in __del__
AttributeError: 'NoneType' object has no attribute 'mmap'
/home/black/micromamba/envs/trainpi/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Even if it's not an actual problem, it's a bit annoying because it overwhelms the logging output when you have many workers.

Here's the simplest possible repro:

import grain.python as grain
import numpy as np
import logging
logging.basicConfig(level=logging.INFO)

if __name__ == "__main__":
    class DataSource:
        def __len__(self):
            return 10

        def __getitem__(self, idx):
            return np.zeros(1)

    source = DataSource()
    sampler = grain.IndexSampler(
        num_records=len(source),
        num_epochs=1,
        shard_options=grain.NoSharding(),
        shuffle=False
    )
    loader = grain.DataLoader(
        data_source=source,
        sampler=sampler,
        worker_count=1,
    )

    for batch in loader:
        pass

The text was updated successfully, but these errors were encountered:

sirmarcel · 2025-01-10T17:52:04Z

Hello, a quick question: Has there been any progress on this? I can reproduce this issue on both linux and macOS, and while it's not critical, it is rather unpleasant to look at.

I'm not quite sure what causes this. It seems that something complicated goes wrong during shutdown, commenting out a few assertions (which fail because various classes have turned into None?!) reveals that the SharedMemoryArray._unlink_thread_pool is no longer alive at the time of __del__.

sirmarcel · 2025-01-20T11:00:10Z

Hey, I looked into this a little more. Some observations:

The example above runs fine if you take care to del the batch after every iteration
The error then still occurs on shutdown if the sampler is not exhausted (for example by settingnum_epochs to something large and then interrupting the main process with CTRL+C)

I think there is some sort of race condition in the shutdown procedure of the GrainPool that leaves orphaned shared memory objects for the batches that were processed by worker threads but not yet consumed by the main thread. I tried to dig into why this happens, but haven't yet found any definite fix or cause. Maybe there just needs to be an atexit handler somewhere that tries to do some cleanup?

It would be nice to fix this, as unclosed shared memory opens the door to memory leaks. Things will probably be cleaned up by the resource tracker, but I think this is not guaranteed on all platforms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: 'NoneType' object has no attribute 'mmap' #398

AttributeError: 'NoneType' object has no attribute 'mmap' #398

kvablack commented Apr 24, 2024

sirmarcel commented Jan 10, 2025

sirmarcel commented Jan 20, 2025

AttributeError: 'NoneType' object has no attribute 'mmap' #398

AttributeError: 'NoneType' object has no attribute 'mmap' #398

Comments

kvablack commented Apr 24, 2024

sirmarcel commented Jan 10, 2025

sirmarcel commented Jan 20, 2025