Add checkpoint on shutdown and --checkpoint CLI option #626
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds two features that make restarting Heritrix more convenient.
First, we add a
checkpointOnShutdown
option to CheckpointService that installs a JVM shutdown hook that creates a checkpoint before the process exits. Using the existingcheckpointIntervalMinutes
option in conjunction with it may be a good idea so that if the shutdown checkpoint was not written because the process was killed, crashed or the server lost power there is still an older periodic checkpoint to recover from.Second, we add a
--checkpoint
command-line option which gives the--run-job
option the ability to restart from a named checkpoint or the 'latest' checkpoint.When both are used together you can launch a job directly from command-line with:
Then if you Ctrl+C, Heritrix will create a checkpoint and exit. If you then run the same command again the crawl will continue from where it left off. :-)