Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resume import functionality #292

Open
hsuominen opened this issue Jan 13, 2019 · 5 comments
Open

Resume import functionality #292

hsuominen opened this issue Jan 13, 2019 · 5 comments

Comments

@hsuominen
Copy link

I'm running elodie to backup on the order +150,000 images scattered across a fairly messy file/folder structure. I've had elodie crash a few times and most recently crashed after 2 days after having processed 50,000 files or so.

It seems that restarting the import requires rereading each file to check if it has already been imported, which by my estimate should take about 13h at least (about 1s per file, presumably to calculate the hash). Is there any built in way to speed this up?

It would seem pretty straightforward to add functionality to pick up again where the import left off - I may address this in a PR if I get around to it

@jmathai
Copy link
Owner

jmathai commented Jan 22, 2019

I think in order to support this there would need to be a --resumable flag passed into the import command. And in that process every import command run with --resumable should store a list of files it has attempted to import.

I believe it requires a new file to store the progress of resumable imports but open to other ideas.

@DZamataev
Copy link
Contributor

DZamataev commented Feb 1, 2019

Maybe try to deal with that by changing the workflow and add a couple of features (especially that PR #297 and that issue #299 ).
If you have all your unsorted files in one folder and all the others in another folder (not inside each other) then you can use these features (--move-source) to move source files from unsorted directory into the sorted directory and also clear the duplicates (--delete-duplicates maybe?) as you go. So if Elodie crashes (BTW pay attention to this PR #298 cuz it fixes one nasty crash) or other interuption occurs you basically start over with no overhead of previously read files because all the files get moved or deleted on import. You see what I mean @jmathai ? Thats the approach I'm trying to implement.

@DZamataev
Copy link
Contributor

I've added PR for delete duplicates functionality. #301

@DZamataev
Copy link
Contributor

DZamataev commented Feb 4, 2019

@hsuominen if you want to test out my approach combined it's here in my fork https://github.com/DZamataev/elodie/tree/feature/move-source-and-separate-media-folders
And the full command will be like this:

elodie.py import --debug --delete-duplicates --move-source --destination="G:\LIBRARY\sorted" G:\LIBRARY\unsorted

@Jogai
Copy link

Jogai commented Dec 5, 2019

I'd recommend acting on a sorted list of files. Then on resume the script can check the last file written and find that in the list and resume there.
For me it crashed on 90%. Now I'll wait for the second run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants