Bug: Runaway memory usage when run on large directories #65

TheLostLambda · 2020-07-09T11:22:37Z

Hey @imsnif ! Another awesome application! Hopefully I can help out with some of those easier issues you've added to the list sometime soon!

I've run into a bit of an issue in running diskonaut on my server through: after around an hour of indexing files, diskonaut was using 22GB of RAM! This pushed a bunch of system processes to swap and brought down a couple of services I had running.

I'll have to take a peek at the code, but I'm assuming this is from tracking the name and information of every file encountered during a scan. If that's the case, then perhaps diskonaut could intelligently decided when RAM usage is getting a bit out of control and collapse sub-trees into a single total size. That would mean that, if you wanted to dive into that sub-tree, you might need to re-index it, but it's just an idea. Having the current directory fully indexed would be good to keep zooming in and out quickly, but I think it's acceptable to re-index sub-directories when you enter them.

As I understand, some other disk-usage applications (like baobab from GNOME) don't even bother tracking individual files. I think it's good that diskonaut does allow you to view those (I'm absolutely in love with the zoom feature!!), but maybe it shouldn't store information for every file all of the time.

Thanks again for the awesome app!

The text was updated successfully, but these errors were encountered:

imsnif · 2020-07-09T12:38:55Z

Holy cow, that's indeed not good!
I was afraid of this, but quite honestly didn't think to test it on such large volumes. Thanks for this input.

I very much agree with your diagnosis of the problem, for sure the issue is keeping everything we scanned in memory. I like your proposed direction. How about keeping a certain "radius" of the tree in memory... a depth level most users won't need to go into (eg. 5, but maybe that's too much/too little... we'd have to play around with it to find out) and triggering a rescan of that branch if the user goes into it (or into one folder before it, ideally)?

I'm very happy to hear you like diskonaut. Your contributions are always very welcome. Will be looking forward to them (for this or other issues). Give me a shout for absolutely anything you need.

avioli · 2020-09-14T00:53:52Z

Just a thought - would an sqlite in-memory db help offload? I have scanned a huge tree of files/dirs and the db file is ~200mb. You can store path/parent/size in three columns, add indexes and then compute dir-size based on the "parent". Or even inodes. Sorry if that's exactly what your app does.

imsnif · 2020-09-14T13:25:16Z

Hmm, interesting idea. I wonder how sqlite would deal with the String issue. :)
Tbh though, I feel this can be solved without introducing another dependency. Thanks for the suggestion though.

pm100 · 2020-09-14T23:33:24Z

or use an on disk sqlite DB, I dont see that the in memory one helps, simply swapping one form of memory storage for another. On disk sqlite gives you ultra fast access with seemless spill onto disk.

avioli · 2020-09-15T00:21:12Z

@pm100 an on-disk dependency means the app will require to deal with DB versions and migrations and what not. My suggestion was to help with using a battle-tested piece of software that has very efficient data structure for storage and querying (as long as it is configured and used correctly).

@imsnif what String issue? The multibyte characters (#54)?

pm100 · 2020-09-15T00:28:18Z

I will make a on disk sqlite version for kicks, I know sqlite v well.

avioli · 2020-09-15T03:56:19Z

@pm100 Feel free to fork this repo and do whatever you feel is best.

imsnif · 2020-09-15T13:34:05Z

@imsnif what String issue? The multibyte characters (#54)?

Ah no, sorry. I was meaning what OP mentioned, where this issue likely is coming from.
What (we assume) the issue here is, is that the string names from all the files are kept in memory and for extremely large volumes (which I believe OP was talking about) this can take up quite some memory.
We may be able to solve this with sqlite. But it will surely need to solve this problem as well (same memory and same strings, after all). While I don't know how/if sqlite solves an issue like this, I assume it involves some sort of caching on the HD. I think it would serve us better to implement something similar ourselves (as I suggested in my comment above), seeing as it would allow us to specialize and even improve the app: eg. allow for an HD cache per subfolder that can also be used between runs and busted when the last scan for that particular subfolder is older than X (or upon user request - we can show "time since last scan" for each subfolder in the UI as an example).

That said, I make a lot of assumptions here. I'm of course willing to take a look at a version with sqlite, but I must honestly say that I'm not convinced about its benefits in this case.

DeaSTL · 2023-12-30T05:26:13Z

I just had this happen too.

imsnif added discussion bug Something isn't working labels Jul 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Runaway memory usage when run on large directories #65

Bug: Runaway memory usage when run on large directories #65

TheLostLambda commented Jul 9, 2020

imsnif commented Jul 9, 2020

avioli commented Sep 14, 2020

imsnif commented Sep 14, 2020

pm100 commented Sep 14, 2020

avioli commented Sep 15, 2020

pm100 commented Sep 15, 2020

avioli commented Sep 15, 2020

imsnif commented Sep 15, 2020

DeaSTL commented Dec 30, 2023

Bug: Runaway memory usage when run on large directories #65

Bug: Runaway memory usage when run on large directories #65

Comments

TheLostLambda commented Jul 9, 2020

imsnif commented Jul 9, 2020

avioli commented Sep 14, 2020

imsnif commented Sep 14, 2020

pm100 commented Sep 14, 2020

avioli commented Sep 15, 2020

pm100 commented Sep 15, 2020

avioli commented Sep 15, 2020

imsnif commented Sep 15, 2020

DeaSTL commented Dec 30, 2023