-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added error handling for objstore not having enough memory to allocate #275
Conversation
Nice commit! But this fails the build on Windows. Can you do something like |
By the way, there's a better way to check if you're out of memory: try allocating it, and then catch the error so you can handle it. I don't think you'd need to explicitly check how much free memory the system has at all (not just because there's a minor race condition, but also because it may be misleading too, since some memory might be used in non-critical ways by the system and still be able to be freed). |
I would do that, except that this allocate does not throw an exception, only a bus error. Thus, we have to check the available memory. This fix is just a patch until the new object store is implemented. |
That's strange, sorry I didn't realize that. In that case then just make it platform-dependent so it doesn't fail on other platforms. Note that even |
@@ -14,9 +16,11 @@ namespace { | |||
#endif | |||
|
|||
void objstore_memcheck(int64_t size) { | |||
#if defined(__unix__) || defined(__linux__) | |||
struct statvfs buffer; | |||
statvfs("/dev/shm/", &buffer); | |||
RAY_CHECK_LE(size + 100, buffer.f_bsize * buffer.f_bfree, "Not enough available memory for allocation by objstore"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if these are necessary for /dev/shm/
, but if it behaves like typical file systems, then:
- If
f_bavail
also works, use that instead off_bfree
. - If applicable, check if
f_favail >= 1
as well. (Note: I only took a cursory glance at this and it may not actually be a valid check; I'm not sure.)
I just tried the following in import numpy as np
x = np.zeros(100000)
l = []
for i in range(1000):
l.append(ray.put(x)) Eventually, I got an error saying
as expected. However, if I then start up import numpy as np
x = np.zeros(100000)
ray.put(x) I immediately get the error
Do I need to clear out some files somewhere? Also, this may be unrelated, but there were several instances where I got this error as soon as I tried anything
|
I wasn't able to trigger the error on Mac OS X. I ran import numpy as np
x = np.zeros(10000000)
l = []
for i in range(1000):
print i
l.append(ray.put(x)) and that completed successfully. >>> import sys
>>> sys.getsizeof(x)
80000096 So this should be storing 80GB in the object store. |
* Pipe num_cpus and num_gpus through from start_ray.py. * Improve load balancing tests. * Fix bug. * Factor out some testing code.
#163