The University of Chicago Library assembles directories of image files, metadata, OCR data, and more for digital project websites like The Goodspeed Manuscript Collection, The Speculum Romanae Magnificentiae Digital Collection, and The University of Chicago Photographic Archive.
These scripts validate, processes, and manages that data. They try to use fast SSH connections for operations like read-only access, and they use WebDAV access for write-access which helps keep the database in sync.
These scripts are tightly coupled to the way that data is stored- you will have to modify these scripts to use them in other contexts.
python3 -m venv digital_collection_validators_env
source digital_collections_validators_env/bin/activate
pip install git+https://github.com/johnjung/digital_collection_validators.git#egg=digital_collection_validators
We currently use public key authentication to provide access to many of our servers. To test that public key authentication works, run the following commands:
ssh-agent
ssh-add
ssh server_name_here
You'll have to use an SSH agent to run any of the commands below.
If you're working on a local server, you can adjust the path in the configuration file to locate where your files are. In files.ini, replace the 'LOCAL' path with the one where your archives are:
[FILES]
local = C:/Users/ksong814/Desktop/ # --> CHANGE
owncloud = /data/voldemort/digital_collections/data/ldr_oc_admin/files/IIIF_Files/
digcoll is a general utility for working with digital collections data. You can use it to report on files in the system.
How many directories are there for Speculum data?
$ digcoll ls speculum | wc -l
993
How many issues are there for each year of the Chicago Maroon?
$ ./digcoll ls mvol-0004 | cut -d '-' -f 3 | sort | uniq -c
56 1902
222 1903
112 1904
162 1905
163 1906
164 1907
163 1908
159 1909
162 1910
161 1911
160 1912
158 1913
161 1914
157 1915
160 1916
157 1917
136 1918
128 1919
...
See what problems exist in a shipment of files.
mvol validate mvol-0004-1937
mvol put_dc_xml mvol-0004-1937-0105
This fixes filenaming errors for every issue in the year 1951.
mvol regularize_mets mvol-0004-1951
Fix filename errors for two different issues.
mvol regularize_pdf mvol-0004-1951-0105 mvol-0004-1951-0111
Check to see if files are 'in sync' between owncloud, the XTF development server, and the XTF production server.
check_sync --owncloud-to-development mvol-0004-1937-0105
Check to see which directories are out of sync between owncloud, development and production.
List all of the owncloud directories under "mvol". Show if they are valid, and if files are present and in sync in dev and production.
python mvol_sync.py --list mvol
List all of the owncloud directories under "mvol-0004". Show if they are valid, and if files are present and in sync in dev and production.
python mvol_sync.py --list mvol-0004
List all of the owncloud directories under "mvol-0004-0030". Show if they are valid, and if files are present and in sync in dev and production.
python mvol_sync.py --list mvol-0004-0030
Create or update a .struct.txt file on owncloud.
put_struct_txt mvol-0004-1937-0105
You may need to modify this program to deal with SSH authentication issues. Paramiko's connect() method can take an optional key_filename parameter to identify an SSH key.
find EWM -name "*.tif" -size 0