Record ID 99125158555106421 was not indexed - June 11th #2411

christinach · 2024-07-11T13:50:22Z

Expected behavior

Record with ID 99125158555106421 was changed in Alma on June 11th. It was part of this incremental_38205463170006421_20240611_180656[026]_new file. Today and with no additional updates the catalog record should reflect the changes from June 11th.

Actual behavior

The record did not get indexed.

Further Notes

Nancy B. from E-resources reported this issue in the catalog channel. @mzelesky investigated the exported files in Alma during this period and found that the record did get sent from Alma.

The timestamp from the JSON file indicates that the record was last indexed on 2024-05-22.

  "electronic_portfolio_s": [
    "{\"desc\":\" Available from 12/27/1890 until 12/31/1890.\",\"title\":\"CRL Open Access Newspapers\",\"url\":\"https://na05.alma.exlibrisgroup.com/view/uresolver/01PRI_INST/openurl?u.ignore_date_coverage=true&portfolio_pid=531019287200006421&Force_direct=true\",\"start\":\"1890\",\"end\":\"1890\",\"notes\":[]}",
    "{\"desc\":\" Available from 1884 until 1936.\",\"title\":\"NewspaperARCHIVE.com\",\"url\":\"https://na05.alma.exlibrisgroup.com/view/uresolver/01PRI_INST/openurl?u.ignore_date_coverage=true&portfolio_pid=53765186600006421&Force_direct=true\",\"start\":\"1884\",\"end\":\"1936\",\"notes\":[]}"
  ],
  "hashed_id_ssi": "2f904f131eec82c4",
  "_version_": 1799761628651061248,
  "timestamp": "2024-05-22T14:00:40.856Z"

Impact of this bug

The users cannot find all the available issues for this record in the catalog.

Suggestion

Fix this one record:
1. create an xml file for the record
2. scp the file to bibdata-qa-worker1,
3. follow the [documentation on how to index an xml file] (https://github.com/pulibrary/bibdata/blob/main/docs/test_indexing.md#scenario-1-test-indexing-a-specific-xml-file) and index the file
4. repeat the same steps on bibdata-worker-staging1.
5. Review the record in catalog-staging.princeton.edu and catalog-qa.princeton.edu (make sure catalog-qa.princeton.edu is pointing to the Solr collection you used to index the file in bibdata-qa-worker1 -step 3)
6. If the record does not reflect the changes review the logs on the VM, download the incremental file locally and troubleshoot in your dev environment.
7. If everything looks ok index the record in bibdata-worker-prod1.
8. Review the record in catalog.princeton.edu
9. If the record reflects the portfolio changes follow up in the catalog channel with Nancy and close this ticket.
OR Run the alma updates since June 9th, 2024
OR Run a full reindex

The text was updated successfully, but these errors were encountered:

christinach · 2024-07-12T11:48:10Z

In bibdata the directory /data/bibdata_files has no incremental_38205463170006421_20240611_180656[026]_new file. -
Bibdata has a lot of events with no attached dump_files. (empty events)
On June 11th @mzelesky reported that publishing jobs have been failing due to FTP errors see also Some incremental files not indexed bibdata#2402. The Alma - FTP errors were caused because there was no space in lib-sftp-prod.
The Alma- FTP errors created empty files on lib-sftp. Bibdata created events with empty files that made the indexing process to stop bibdata#2403. The empty files caused the indexing process to stop.
June 13th We manually moved the indexing process.
When @mzelesky cleared some space in lib-sftp, files were fetched from Alma into lib-sftp but bibdata did not fetch them.
My next step is to review the failed job ids in Alma and see if they are also tracked in the AWS Webhook with an 'error' value.

christinach · 2024-07-17T19:55:05Z

We checked with @mzelesky job_id: 38206309060006421 from alma. The specific job failed. We found the id in the webhook.
I looked into the database and even though the job failed the event was created with a success: true.

We should not create an Event if the job failed in Alma. Currenlty we are only checking if the message body includes the alma job names from our alma.yml configuration.
We should also check if the message_body["job_instance"]["status"]["value"] matches 'COMPLETED_FAILED' and skip it. For a failed job the json looks like the following:

"body:{\"id\":\"38206309060006421\",\"action\":\"JOB_END\",\"institution\":{\"value\":\"01PRI_INST\",\"desc\":\"Princeton University Library\"},\"time\":\"2024-06-11T19:40:52.051Z\",\"job_instance\":{\"id\":\"38206309060006421\",\"name\":\"Publishing Platform Job Incremental Publishing\",\"progress\":209.0,\"status\":{\"value\":\"COMPLETED_FAILED\",\"desc\":\"Completed with Errors\"},\"external_id\":\"38206612000006421\",\"submitted_by\":{\"value\":\"System\"},\"submit_time\":\"2024-06-11T18:00:12.461Z\",\"start_time\":\"2024-06-11T18:30:16.963Z\",\"end_time\":\"2024-06-11T19:40:52.051Z\",\"status_date\":\"2024-06-11Z\",\"alert\":[{\"value\":\"alert_general_error\",\"desc\":\"The job completed with errors. For more information view the report details (or contact Support using the process ID).\"}],\"counter\":[{\"type\":{\"value\":\"label.new.records\",\"desc\":\"New Records\"},\"value\":\"7\"},{\"type\":{\"value\":\"label.updated.records\",\"desc\":\"Updated Records\"},\"value\":\"432\"},{\"type\":{\"value\":\"label.deleted.records\",\"desc\":\"Deleted Records\"},\"value\":\"1\"},{\"type\":{\"value\":\"c.jobs.publishing.failed.publishing\",\"desc\":\"Unpublished failed records\"},\"value\":\"0\"},{\"type\":{\"value\":\"c.jobs.publishing.skipped\",\"desc\":\"Skipped records (update date changed but no data change)\"},\"value\":\"193\"},{\"type\":{\"value\":\"c.jobs.publishing.filtered_out\",\"desc\":\"Filtered records (not published due to filter)\"},\"value\":\"0\"},{\"type\":{\"value\":\"c.jobs.publishing.totalRecordsWrittenToFile\",\"desc\":\"Total records written to file\"},\"value\":\"0\"},{\"type\":{\"value\":\"FTP has failed.\",\"desc\":\"\"},\"value\":\"Download zip file for manual FTP.\"}],\"job_info\":{\"id\":\"S32986800410006421\",\"name\":\"Publishing Platform Job Incremental Publishing\",\"description\":\"Publishing Platform Job\",\"type\":{\"value\":\"SCHEDULED\",\"desc\":\"Scheduled\"},\"category\":{\"value\":\"PUBLISHING\",\"desc\":\"Publishing\"},\"link\":\"/almaws/v1/conf/jobs/S32986800410006421\"},\"link\":\"/almaws/v1/conf/jobs/S32986800410006421/instances/38206309060006421\"}}"

This is what we save in the Event as message_body.

christinach · 2024-07-17T19:58:31Z

The incremental file that includes the record that was not indexed is: incremental_38205463170006421_20240611_180656[026]_new. The second part after incremental is the job process id from alma. This incremental file was in lib_sftp. When I check the AWS lambda log for this job id it has status 'COMPLETED_FAILED'.

"body:{\"id\":\"38205463170006421\",\"action\":\"JOB_END\",\"institution\":{\"value\":\"01PRI_INST\",\"desc\":\"Princeton University Library\"},\"time\":\"2024-06-11T18:27:10.961Z\",\"job_instance\":{\"id\":\"38205463170006421\",\"name\":\"Publishing Platform Job Incremental Publishing\",\"progress\":101.3,\"status\":{\"value\":\"COMPLETED_FAILED\",\"desc\":\"Completed with Errors\"},\"external_id\":\"38205728880006421\",\"submitted_by\":{\"value\":\"System\"},\"submit_time\":\"2024-06-11T17:00:11.594Z\",\"start_time\":\"2024-06-11T17:20:13.584Z\",\"end_time\":\"2024-06-11T18:27:10.961Z\",\"status_date\":\"2024-06-11Z\",\"alert\":[{\"value\":\"alert_general_error\",\"desc\":\"The job completed with errors. For more information view the report details (or contact Support using the process ID).\"}],\"counter\":[{\"type\":{\"value\":\"label.new.records\",\"desc\":\"New Records\"},\"value\":\"66\"},{\"type\":{\"value\":\"label.updated.records\",\"desc\":\"Updated Records\"},\"value\":\"3453\"},{\"type\":{\"value\":\"label.deleted.records\",\"desc\":\"Deleted Records\"},\"value\":\"0\"},{\"type\":{\"value\":\"c.jobs.publishing.failed.publishing\",\"desc\":\"Unpublished failed records\"},\"value\":\"0\"},{\"type\":{\"value\":\"c.jobs.publishing.skipped\",\"desc\":\"Skipped records (update date changed but no data change)\"},\"value\":\"224\"},{\"type\":{\"value\":\"c.jobs.publishing.filtered_out\",\"desc\":\"Filtered records (not published due to filter)\"},\"value\":\"0\"},{\"type\":{\"value\":\"c.jobs.publishing.totalRecordsWrittenToFile\",\"desc\":\"Total records written to file\"},\"value\":\"0\"},{\"type\":{\"value\":\"FTP has failed.\",\"desc\":\"\"},\"value\":\"Download zip file for manual FTP.\"}],\"job_info\":{\"id\":\"S32986800410006421\",\"name\":\"Publishing Platform Job Incremental Publishing\",\"description\":\"Publishing Platform Job\",\"type\":{\"value\":\"SCHEDULED\",\"desc\":\"Scheduled\"},\"category\":{\"value\":\"PUBLISHING\",\"desc\":\"Publishing\"},\"link\":\"/almaws/v1/conf/jobs/S32986800410006421\"},\"link\":\"/almaws/v1/conf/jobs/S32986800410006421/instances/38205463170006421\"}}"

@mzelesky since this job failed in Alma how did it generate the file with the failed Job Id?

christinach · 2024-07-17T20:33:24Z

The message_body with job_id:'38205463170006421' is in bibdata event 7338 with no attached dump file.

Next step check the timestamp of the incremental file in lib_sftp.

Update on 7/18/2024: I checked lib-sftp and the file has time stamp: "Jun 12 14:05 (UTC)" -rw-r--r-- 1 almasftp pul_g 1417919 Jun 12 14:05 'incremental_38205463170006421_20240611_180656[026]_new.tar.gz'

So far my understanding is that:

The alma job with process_id: 38205463170006421 failed on 2024-06-11T18:27 (UTC) (this is the time we see in bibdata as 'finish' and in alma as 'finished on')

bibdata events page - event 7338:

alma job report page - process id: 38205463170006421

The webhook received the job alma event ID (38205463170006421) ("time":"2024-06-11T18:27:10.961Z)
Bibdata created event#7338 based on the job id that received from the webhook.
There was no file yet in lib-sftp. (probably because there was no disk space. this is the issue we had during this week.)
For some reason the failed Alma job generated an incremental file: incremental_38205463170006421_20240611_180656[026]_new
The incremental file was transferred to lib-sftp on Jun 12 14:05 (UTC time). (probably when there was some free disk space)
Bibdata did not attach the file to the event that was created the previous day because: The bibdata event had in the message_body the process_id from alma and triggered a background job to fetch an incremental file with this job_id in the file name. This never happened because the job failed in alma, the file that the was generated was only transferred a day later when there was free space in lib-sftp job.
Alma continued creating failed jobs every hour with the same effect on bibdata. Every time a successful job was created, Bibdata would have the alma process id in the new Event, looking for the transferred file in lib_sftp and moving forward. Bibdata will not look back to the event -that was created yesterday and finished and has a message_body with an alma process id A- and will not search lib-sftp for an incremental file A that came days later. AWS sqs poller retains the messages in the queue for 14 days. '

The main issues here are:

Alma had a failed job that send a file a day later with filename of the failed process id. This file should have been included in the next successful Alma job with the successful job id in the filename. (I will discuss this with @mzelesky )
- Update on 07/19/2024: @mzelesky confirmed that the failed job in Alma should not have generated a file with these records. The records should have moved in the next successful job and generated in as part of this job.
lif-sftp ran out of space (@acozine is working on setting up a scheduled cleanup job)
Bibdata creates events for failed jobs.
- I discussed this with @mzelesky . It would be better to create the event with the Alma failed job in bibdata and track it down in the UI. We can add a new attribute 'Alma job status' in the events table with the ["status"]["value"] from the job_instance message body that we save from the webhook message. Next we can create a datadog alert to let us know if there is an Alma job with status 'COMPLETED_FAILED' and further investigate this. I will create a new ticket for this.

christinach · 2024-07-22T14:26:29Z

I created #2415 and #2416 .

christinach added bug 🐛 The application does not work as expected because of a defect investigate Tickets related to work that needs investigation labels Jul 11, 2024

christinach assigned christinach and unassigned christinach Jul 11, 2024

kevinreiss added the sudden priority label Jul 11, 2024

christinach self-assigned this Jul 11, 2024

christinach mentioned this issue Jul 12, 2024

Some incremental files not indexed by Orangelight #2402

Closed

christinach mentioned this issue Jul 12, 2024

Full reindex week of [7/15/2024] #2412

Closed

4 tasks

christinach closed this as completed Jul 22, 2024

maxkadel mentioned this issue Jul 22, 2024

FTP failure on jobs that use lib-sftp #2400

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record ID 99125158555106421 was not indexed - June 11th #2411

Record ID 99125158555106421 was not indexed - June 11th #2411

christinach commented Jul 11, 2024 •

edited

Loading

christinach commented Jul 12, 2024 •

edited

Loading

christinach commented Jul 17, 2024 •

edited

Loading

christinach commented Jul 17, 2024 •

edited

Loading

christinach commented Jul 17, 2024 •

edited

Loading

christinach commented Jul 22, 2024

Record ID 99125158555106421 was not indexed - June 11th #2411

Record ID 99125158555106421 was not indexed - June 11th #2411

Comments

christinach commented Jul 11, 2024 • edited Loading

Expected behavior

Actual behavior

Further Notes

Impact of this bug

Suggestion

christinach commented Jul 12, 2024 • edited Loading

christinach commented Jul 17, 2024 • edited Loading

christinach commented Jul 17, 2024 • edited Loading

christinach commented Jul 17, 2024 • edited Loading

bibdata events page - event 7338:

alma job report page - process id: 38205463170006421

christinach commented Jul 22, 2024

christinach commented Jul 11, 2024 •

edited

Loading

christinach commented Jul 12, 2024 •

edited

Loading

christinach commented Jul 17, 2024 •

edited

Loading

christinach commented Jul 17, 2024 •

edited

Loading

christinach commented Jul 17, 2024 •

edited

Loading