Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Runaway memory usage when run on large directories #65

Open
TheLostLambda opened this issue Jul 9, 2020 · 9 comments
Open

Bug: Runaway memory usage when run on large directories #65

TheLostLambda opened this issue Jul 9, 2020 · 9 comments
Labels
bug Something isn't working discussion

Comments

@TheLostLambda
Copy link

Hey @imsnif ! Another awesome application! Hopefully I can help out with some of those easier issues you've added to the list sometime soon!

I've run into a bit of an issue in running diskonaut on my server through: after around an hour of indexing files, diskonaut was using 22GB of RAM! This pushed a bunch of system processes to swap and brought down a couple of services I had running.

I'll have to take a peek at the code, but I'm assuming this is from tracking the name and information of every file encountered during a scan. If that's the case, then perhaps diskonaut could intelligently decided when RAM usage is getting a bit out of control and collapse sub-trees into a single total size. That would mean that, if you wanted to dive into that sub-tree, you might need to re-index it, but it's just an idea. Having the current directory fully indexed would be good to keep zooming in and out quickly, but I think it's acceptable to re-index sub-directories when you enter them.

As I understand, some other disk-usage applications (like baobab from GNOME) don't even bother tracking individual files. I think it's good that diskonaut does allow you to view those (I'm absolutely in love with the zoom feature!!), but maybe it shouldn't store information for every file all of the time.

Thanks again for the awesome app!

@imsnif
Copy link
Owner

imsnif commented Jul 9, 2020

Holy cow, that's indeed not good!
I was afraid of this, but quite honestly didn't think to test it on such large volumes. Thanks for this input.

I very much agree with your diagnosis of the problem, for sure the issue is keeping everything we scanned in memory. I like your proposed direction. How about keeping a certain "radius" of the tree in memory... a depth level most users won't need to go into (eg. 5, but maybe that's too much/too little... we'd have to play around with it to find out) and triggering a rescan of that branch if the user goes into it (or into one folder before it, ideally)?

I'm very happy to hear you like diskonaut. Your contributions are always very welcome. Will be looking forward to them (for this or other issues). Give me a shout for absolutely anything you need.

@imsnif imsnif added discussion bug Something isn't working labels Jul 9, 2020
@avioli
Copy link

avioli commented Sep 14, 2020

Just a thought - would an sqlite in-memory db help offload? I have scanned a huge tree of files/dirs and the db file is ~200mb. You can store path/parent/size in three columns, add indexes and then compute dir-size based on the "parent". Or even inodes. Sorry if that's exactly what your app does.

@imsnif
Copy link
Owner

imsnif commented Sep 14, 2020

Hmm, interesting idea. I wonder how sqlite would deal with the String issue. :)
Tbh though, I feel this can be solved without introducing another dependency. Thanks for the suggestion though.

@pm100
Copy link
Contributor

pm100 commented Sep 14, 2020

or use an on disk sqlite DB, I dont see that the in memory one helps, simply swapping one form of memory storage for another. On disk sqlite gives you ultra fast access with seemless spill onto disk.

@avioli
Copy link

avioli commented Sep 15, 2020

@pm100 an on-disk dependency means the app will require to deal with DB versions and migrations and what not. My suggestion was to help with using a battle-tested piece of software that has very efficient data structure for storage and querying (as long as it is configured and used correctly).

@imsnif what String issue? The multibyte characters (#54)?

@pm100
Copy link
Contributor

pm100 commented Sep 15, 2020

I will make a on disk sqlite version for kicks, I know sqlite v well.

@avioli
Copy link

avioli commented Sep 15, 2020

@pm100 Feel free to fork this repo and do whatever you feel is best.

@imsnif
Copy link
Owner

imsnif commented Sep 15, 2020

@imsnif what String issue? The multibyte characters (#54)?

Ah no, sorry. I was meaning what OP mentioned, where this issue likely is coming from.
What (we assume) the issue here is, is that the string names from all the files are kept in memory and for extremely large volumes (which I believe OP was talking about) this can take up quite some memory.
We may be able to solve this with sqlite. But it will surely need to solve this problem as well (same memory and same strings, after all). While I don't know how/if sqlite solves an issue like this, I assume it involves some sort of caching on the HD. I think it would serve us better to implement something similar ourselves (as I suggested in my comment above), seeing as it would allow us to specialize and even improve the app: eg. allow for an HD cache per subfolder that can also be used between runs and busted when the last scan for that particular subfolder is older than X (or upon user request - we can show "time since last scan" for each subfolder in the UI as an example).

That said, I make a lot of assumptions here. I'm of course willing to take a look at a version with sqlite, but I must honestly say that I'm not convinced about its benefits in this case.

@DeaSTL
Copy link

DeaSTL commented Dec 30, 2023

I just had this happen too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working discussion
Projects
None yet
Development

No branches or pull requests

5 participants