Snapshot Tool

Process metadata snapshots from DataCite and Crossref.

The Crossref Public Data File is available here: https://www.crossref.org/learning/public-data-file/

The DataCite Public Data File is available here: https://support.datacite.org/docs/datacite-public-data-file

It can:

count the number of metadata records and generate other stats
output list of DOIs
combine multiple snapshot files into one single file

Installation

Run cargo install pardalotus_snapshot_tool.

Or to build from source, run cargo build --release. The binary is available in the ./target/release directory.

Input

Supply the path to a directory or file with --input. This should contain all snapshot files you're interested, including Crossref and/or DataCite files. It will be scanned recursively, and files with unrecognised extensions will be skipped.

The tool can accept files with extensions:

*.json.gz (Crossref)
*.tgz (DataCite)
*.jsonl.gz - Output from this tool.

Output

This tool can combine many files into one file. By supplying the --out <filename> you can combine all the data in the snapshot input directory into one file.

Functionality

Show help

pardalotus_snapshot_tool --help

Verbose

Add --verbose to any command for information on what's going on internally. Useful when reading large mysterious files.

List files

pardalotus_snapshot_tool --input /path/to/snapshots --lit-input-files

This is useful for checking which snapshot files will be included.

Count Records

pardalotus_snapshot_tool --input /path/to/snapshots --stats

Count how many metadata records are present across snapshots, as well as other stats.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Snapshot Tool

Installation

Input

Output

Functionality

Show help

Verbose

List files

Count Records

License

About

Releases

Packages

Languages

License

Pardalotus/pardalotus_snapshot_tool

Folders and files

Latest commit

History

Repository files navigation

Snapshot Tool

Installation

Input

Output

Functionality

Show help

Verbose

List files

Count Records

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages