-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to handle zstor temporary outage/SIGKILL #21
Comments
Commands issue in zdb hook are usually blocking (not all of them), it's better to have the fastest hook possible, adding some check and retry there is really not a good idea :) |
A quick and easy solution would be to add a script that runs as a cron job and does the The index files are more tricky though, since they can be mutated and then need to be uploaded again. In that case, we could check that the hash returned by zstor matches the hash of the local copy. Another approach would be storing the set of modified files when the relevant hook runs, then clearing them when a Checking all the files on each run isn't ideal. So we could keep a list of all the file names that were already checked. That adds some complication but is probably an acceptable trade off. |
I put together a script implementing my idea above. It has the following behaviors:
|
If zdb calls the hook and zstor was down, nothing is done to make sure the datafile is uploaded. Also, a recent change is zstor was made to make the store commands non-blocking and it internally queues theses commands. If zstor was SIGKILLed, this data is never uploaded again.
After discussion with @LeeSmet @maxux, two approaches is suggested. To make zstor client store the
store
commands in a persistent queue from which zstor can pick up the commands and execute them. The second approach is to make zdb use thecheck
zstor command to check that the files are uploaded successfully. For example, it can keep track of last uploaded data file, and check periodically the file after it for being successfully uploaded, and if it's not it can reissue thestore
command.The text was updated successfully, but these errors were encountered: