Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checkpoint on shutdown and --checkpoint CLI option #626

Merged
merged 3 commits into from
Nov 28, 2024

Conversation

ato
Copy link
Collaborator

@ato ato commented Nov 24, 2024

This PR adds two features that make restarting Heritrix more convenient.

First, we add a checkpointOnShutdown option to CheckpointService that installs a JVM shutdown hook that creates a checkpoint before the process exits. Using the existing checkpointIntervalMinutes option in conjunction with it may be a good idea so that if the shutdown checkpoint was not written because the process was killed, crashed or the server lost power there is still an older periodic checkpoint to recover from.

Second, we add a --checkpoint command-line option which gives the --run-job option the ability to restart from a named checkpoint or the 'latest' checkpoint.

When both are used together you can launch a job directly from command-line with:

heritrix -a password -r myjob -c latest

Then if you Ctrl+C, Heritrix will create a checkpoint and exit. If you then run the same command again the crawl will continue from where it left off. :-)

ato added 3 commits November 24, 2024 13:08
This enables a checkpoint to be automatically created during a graceful termination. This makes it easier to stop and restart Heritrix without having to manually checkpoint each running job.
This makes it possible to select a checkpoint from the command-line when using the --run-job option.
@ato ato force-pushed the checkpoint-on-shutdown branch from b4e87f4 to ec689db Compare November 24, 2024 05:36
@ato ato merged commit 4093871 into master Nov 28, 2024
6 checks passed
@ato ato deleted the checkpoint-on-shutdown branch November 28, 2024 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant