Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OAI to CSV toolchain to support migrations from Islandora 7.x to CLAW #463

Open
mjordan opened this issue Apr 12, 2018 · 4 comments
Open
Assignees

Comments

@mjordan
Copy link
Collaborator

mjordan commented Apr 12, 2018

Islandora/documentation#452 asks whether we can use Drupal 8's migration API to batch ingest content into CLAW. I've got an MIK toolchain that harvests content from 7.x using OAI-PMH and writes out input for a Migrate Plus ingest. Still working on it while travelling but will have something substantially complete within a couple days.

@mjordan mjordan self-assigned this Apr 12, 2018
@mjordan
Copy link
Collaborator Author

mjordan commented Apr 12, 2018

BTW, doing this work is also a good test of MIK's developer documentation. I'll probably be opening a couple issues resulting from this work.

@mjordan
Copy link
Collaborator Author

mjordan commented Apr 12, 2018

Related issue: #378.

mjordan added a commit that referenced this issue Apr 12, 2018
@mjordan
Copy link
Collaborator Author

mjordan commented Apr 12, 2018

Got this to the point where you can harvest a collection via OAI-PMH and end up with a CSV file similar to the one prepared by @seth-shaw-unlv at the CLAW issue linked above. Sample .in file is:

; MIK configuration file for migrating content from an Islandora
; instance to the format required by the Migrate+ module, for ingesting
; into Islandora CLAW.

[SYSTEM]

[CONFIG]
config_id = MIK OAI to CSV toolchain
last_updated_on = "2018-04-16"
last_update_by = "Mark Jordan"

[FETCHER]
class = Oaipmh
oai_endpoint = "http://localhost:8000/oai2"
set_spec = doitest_collection
temp_directory = "/tmp/oai_to_csv_temp"

[METADATA_PARSER]
class = csv\DcToCsv
; The field identified in record_key is added to the output CSV containing the item's unique ID.
record_key = ID
; DC element names are used as CSV column headings.
dc_elements[] = title
dc_elements[] = identifier
dc_elements[] = description
dc_elements[] = format

[FILE_GETTER]
class = OaipmhIslandoraObj
temp_directory = "/tmp/oai_to_csv_temp"
datastream_ids[] = OBJ

[WRITER]
class = OaipmhCsv
output_file = "/tmp/oai_to_csv_output/metadata.csv"
output_directory = "/tmp/oai_to_csv_output"
; metadata_only = true

[MANIPULATORS]

[LOGGING]
path_to_log = "/tmp/oai_to_csv_output/mik.log"
path_to_manipulator_log= "/tmp/oai_to_csv_output/manipulator.log"

Here's the resulting CSV file:

ID,title,identifier,description,format
oai%3Adrupal-site.org%3Adoitest_16,"autogen 6 - blurg",doitest:16,"This record was harvested on a Thursday.","nonprojected graphic"
oai%3Adrupal-site.org%3Adoitest_4,"Church Holy Rosary, Vancouver B.C.",doitest:4,"Holy Rosary Church in Vancouver, B.C."
oai%3Adrupal-site.org%3Adoitest_5,"Second test object.",doitest:3,"This record was harvested on a Thursday."
oai%3Adrupal-site.org%3Adoitest_6,"Has DOI?",doitest:6,"This record was harvested on a Thursday.",globe
oai%3Adrupal-site.org%3Adoitest_12,"autogen 6",doitest:12,"This record was harvested on a Thursday.","nonprojected graphic"

@mjordan
Copy link
Collaborator Author

mjordan commented Apr 19, 2018

Based on discussion at the April 18 CLAW Technical call, I've added an option to output an XML file containing the harvested DC or MODS instead of a CSV file. The generation of this output file is not done via OAI to CSV toolchain, but rather via a shutdown hook script used with the existing OAI Islandora toolchain:

[SYSTEM]

[CONFIG]
config_id = MIK OAI toolchain
last_updated_on = "2018-04-18"
last_update_by = "Mark Jordan"

[FETCHER]
class = Oaipmh
oai_endpoint = "http://localhost:8000/oai2"
set_spec = clawcall_collection
metadata_prefix = mods
temp_directory = /tmp/claw_call_tmp

[METADATA_PARSER]
; We don't use the new  csv\DcToCsv parser here.
class = mods\OaiToMods

[FILE_GETTER]
class = OaipmhIslandoraObj
temp_directory = /tmp/claw_call_tmp
datastream_ids[] = OBJ

[WRITER]
; We don't use the new OaipmhCsv writer here.
class = Oaipmh
output_directory = "/tmp/claw_call"
; This is the new shutdown hook script.
shutdownhooks[] = "php extras/scripts/shutdownhooks/concatentate_xml_files.php"

[MANIPULATORS]

[LOGGING]
path_to_log = "/tmp/claw_call/mik.log"
path_to_manipulator_log = "/tmp/claw_call/manipulator.log"

mjordan added a commit that referenced this issue Apr 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant