-
Notifications
You must be signed in to change notification settings - Fork 3
ARE issue: init_fs_encoding: failed to get the Python codec #40
Comments
Known issue I'm afraid. The root cause is this:
There is something broken about how the linux kernel is handling the squashfs loopback device. As far as I can tell, its some kind of unhandled cache miss. I've been trying on and off to figure out a workaround for it, but nothing has helped yet. The shared loop device setting in singularity reduced the occurrence of this error, but it still shows up from time to time, as you've seen. I used to have a semi-consistent reproducer, but when NCI changed the shared loopback setting reduced the OSError rate to about 1/1000 |
Thanks. I only use the environment via PBS and have never encountered that pb. @dougiesquire and @headmetal any comments? |
Over the last month or so, I'd say I'm getting this error (or variants of this error that contain the Admittedly i haven't been running these sessions that much over the last week or so, so haven't had any recently - but will track it over the next few weeks and report back. |
Andy mentioned the issue too. |
Interesting. That's way more prevalent than in the hh5 squashfs envs. Can I join xp65 and do some tests with your environment? |
Please 🙏 😁 |
I've had this happen multiple times today so it should be fairly easy to reproduce. Even if it doesn't occur on start-up, the error will sometimes occur partway through a JupyterLab session when a cell is run. |
@dougiesquire I've noticed the prevalence of this issue is dependent on the workflow. I thnk (but am not certain) the more modules get imported, especially in parallel e.g. during dask cluster startup, the more likely it is to show up. Would I be able to have a copy of whatever you were running yesterday when this kept happening? Doesn't need to be a minimal reproducer, in fact, the more complicated the better. Can you also let me know which version of |
Sure - I was getting it fairly often when running this notebook: https://github.com/ACCESS-NRI/NRI-Workshop2023-MED/blob/main/Intake_tutorial_p1.ipynb I was using |
OK, I've made some progress on this. I managed to set up a stress test that would fail with this error about 80% of the time. |
Thanks Dale - 20% sounds a lot better than 80% already |
Hi @dsroberts
We have been encountering that issue in the last few weeks:
Here is the full log output:
Looks like a pb with the file system. Any idea?
Thanks!
The text was updated successfully, but these errors were encountered: