-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
oai.content_dm.csu_sac validation results: [base contentdm is_shown_by logic & type vocab] #664
Comments
Collection registry #: 11068 CSU SAC validation reports are missing types; it seems it is because there is a wide array of terms they are using for "type," and we do not have everything included in our type vocab set. There are 8 pages (0-8) of vernacular data; I went through pages 0-5; then skimmed 6-7. I think there are a handful we should consider including, but I think we need to really be okay with losing some of these. OR, we can opt to use the Registry "fill" option with "physical object", which it looks like many of these are. Here are the ones I think we should consider adding: Here's my compiled running list of page 4:
page 5:
page 6: (there are a ton more on this page) page 7: (and there are more on this page) |
Fixes for is_shown_by logic should trickle down from the base OAI mapper. See #678 |
Fixes requested
base contentdm mapper issue
is_shown_by
: For collection #26723 only. There is a URL transformation that needs to happen at the base contentdm mapper level. Need to fish<dc:identifier>
and transform that URL; see the legacy base contentdm mapper: https://github.com/calisphere-legacy-harvester/dpla-ingestion/blob/cfe3dcb06008c0c6cb9d8207fe28bfaa1a855e4f/lib/mappers/contentdm_oai_dc_mapper.py#L31For example:
Fish for:
<dc:identifier>http://csus.contentdm.oclc.org/cdm/ref/collection/teaware/id/230</dc:identifier>
& transform to:
http://csus.contentdm.oclc.org/utils/getthumbnail/collection/teaware/id/230
type vocab additions
Resolved issues
fetched, mapped, and actual (CSU DAMS records) do not line up. The Solr and Rikolti counts do not line up. To make things stranger, neither of these match up with the expected number of records in the CSU DAMS. See logs: https://7a8067cb-3b99-477e-a883-7e311175a9b4.c3.us-west-2.airflow.amazonaws.com/dags/validate_by_mapper_type/grid?dag_run_id=manual__2023-11-20T21%3A47%3A47%2B00%3A00&task_id=map_endpoint_task&tab=logs -- this is resolved via the fix to address empty mapped metadata pages.
language: this is a question #11068 Rikolti is mapping language as "['jpn, fre, eng']" -- vernacular has dc:languagejpn, fre, eng</dc:language> (instead of three distinct entries) -- okay to accept? but there may be a related enrichment issue with languages. -- this is a question we should ask the contributor.
![Image](https://private-user-images.githubusercontent.com/32110172/288529069-875e1dbe-2869-4789-99de-380995da1a3c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxNjMwNzUsIm5iZiI6MTczOTE2Mjc3NSwicGF0aCI6Ii8zMjExMDE3Mi8yODg1MjkwNjktODc1ZTFkYmUtMjg2OS00Nzg5LTk5ZGUtMzgwOTk1ZGExYTNjLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEwVDA0NDYxNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTM3MDNiZTkxMjEwNjU1NTE1MWFlOWRkNWExNWNiOTE4MDY1NDk2ZmMwYzNlNGZiYjMxYmE0MzRhOGRlNGU5MDAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.8okyri3Po4M21EIFMn-3m8WVYm6FBsSQ4U_UNKU67FI)
The text was updated successfully, but these errors were encountered: