Skip to content

Synchronizing GOKb instances

Moritz Horn edited this page Sep 30, 2019 · 8 revisions

GOKb sync scripts

For the purpose of synchronizing a (new) GOKb instance with an existing database, this repository comes with multiple Groovy sync scripts. Their purpose is to copy the data rather than create a like-for-like duplicate of the gokb server (With users, curatorial groups, etc, etc). The order is important - we suggest Orgs, Titles, Platforms, Packages

Setup

The sync scripts are written using groovysh, sdkman is a handy way to manage groovy and grails installations:

sudo apt-get install zip
curl -s "https://get.sdkman.io" | bash
source "/home/ubuntu/.sdkman/bin/sdkman-init.sh"     # Only needed first time
sdk use groovy
# Prompted, reply "Y"

It's probably best to clone the source repository to get the scripts

git clone https://github.com/openlibraryenvironment/gokb.git
cd scripts

If you are running the scripts for the first time, grape will download all required dependencies. This may take a moment... Eventually you should see status messages downloading blocks of data and then a series of 200 OK responses showing that data is being loaded.

Configuration

Each of the four main sync scripts (sync_gokb_orgs.groovy, sync_gokb_platforms.groovy, sync_gokb_titles.groovy & sync_gokb_packages.groovy) supports the usage of a separate configuration file to override the default values (which include sending the data to a localhost instance). The files must be in the same folder and the naming of each config file is determined by the pattern sync-gokb-{'orgs'|'platforms'|'titles'|'packages'}-cfg.json. A full configuration would look like this:

{
  "uploadUser":"targetSystemUser",
  "uploadPass":"targetSystemUserPass",
  "targetBase":"http(s)://target.url/",
  "sourceBase":"http(s)://source.url/"
}

Important: The script currently works under the assumption that both GOKb applications are hosted at the '/gokb/' webapp endpoint. If this is not the case for either side, the config values sourceContext and targetContext can be used to modify this behaviour (so the value '' would denote the app running under the main context)

During or after a run of each script, additional fields may be written to these files:

  • resumptionToken: The resumptionToken of the last OAI API call. Useful for resuming after an interruption, is empty after a finished run.
  • lastTimestamp: The timestamp of the last item of the last API call.
  • lastRun: The highest timestamp of the last script run. Should be equal to lastTimestamp after a finished run.

Update mode

Each script may be run with the parameter --update to only request data that has been changed after the last successful run (config value lastRun). This is useful for continuous synchronization with another instance, as it avoids unnecessary API calls.