-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fatal error: all goroutines are asleep - deadlock! #11
Comments
Its possible this was caused by a disk space issue. Better error handling and logging would be preferable. |
We ran into this, but our disk is far from full:
The rest of our caches work fine, but this one repository keeps on crashing when restoring the cache. |
Deleted the tar file from the host machine and restarted the build twice, and the cahe restoring is working again. Perhaps the tar file had been corrupted somehow. |
We've encountered this bug two more times in the last few weeks. It stalls development, which is a tad annoying. The fact that it only happens sometimes seems to point at a race condition somewhere. I can't upload the tar files, though, as they contain private code. One possible explanation is that we're using this plugin on a VM, and we run up to two CI jobs at a time. So perhaps two jobs could end up writing the same tar file in the host filesystem at the same time, potentially producing a bad tar file. Still, that shouldn't explain a deadlock. And the OP is running only one concurrent job at a time. |
But even then it shouldn't result in a deadlock. Are you sure it's not a space issue? |
As said before, not even close:
For completeness, here's the panic we got yesterday:
|
In case this is useful to others, we ended up doing caching ourselves completely from scratch. It took just a tiny Docker image and a few lines of shell:
This has several advantages over this plugin, or really any other "volume cache" plugin I could find:
We've also seen a moderate speed-up when moving from drone-volume-cache to this method, presumably due to the compression. For example, restores went down from ~25s to ~15s on average, while the decompressed size stayed the same. |
DRONE_MAX_PROCS is set to 1, on each of the servers, so this should not be a concurrency issue with multiple jobs running concurrently on the same host.
The text was updated successfully, but these errors were encountered: