Many POD Updates Missed from February #722

kevinreiss · 2024-03-15T14:08:50Z

Expected behavior

When updates happen in Alma to records matching our POD publishing profile these updates should be sent to POD aggregator.

Actual behavior

See https://pod.stanford.edu/organizations/princeton for the list of datafiles accepted by POD. You can see the volume of records processed in Feb. is much, much lower than the amount of data written out by the Daily Alma Publisning Job when you review the Publishing Jobs Log in Alma (which shows that many millions of updates passed through the POD process) and when you look at the output files on lib-sftp in /alma/pod you can see many > 100 MB files produced throughout February.

Steps to replicate

This issue requires investigation. We likley have to try and re-process updates from a day during this period with mass updates and observe the results. No exceptions related to this process appear to have been logged.

Impact of this bug

We have out of date data available to our Resource Sharing partners in Borrow Direct. The updates missed likely represent very few new records since the volume here was caused by mass record clean-up work by CaMS.

Honeybadger link and code snippet, if applicable

This may be another version of the out of memory issue #695.

Implementation notes, if any

See https://pod.stanford.edu/organizations/princeton for the list of datafiles accepted by POD. You can see the volume of records processed in Feb. is much, much lower than the amount of data written out by the Daily Alma Publisning Job. Interestingly if you look in the history of Princeton's submissions to POD you can see a period 8/1/2023-8/15/2023 where on some days we submitted files successfully with many hundreds of thousands of record updates so we've seen a period where a somewhat comparable volume of data passed successfully through our process. The data in February is greater in volume than August so perhaps maybe we reached a tipping point of some sort.

Acceptance Criteria

Look at the cron log on these servers (we found for the Submit Collection issue that Out of Memory errors were not being logged anywhere)
Identify the root cause of the error
Create tickets to address the error (if possible)
Create a ticket/plan for how we can get the POD data current after we assess this issue

The text was updated successfully, but these errors were encountered:

kevinreiss · 2024-04-04T15:18:18Z

Wrote the POD support team to see if they have logged any errors related to PUL submissions in the last several months, since we are seeing almost daily discrepancies btw Alma publishing on what POD logs.

kevinreiss · 2024-04-08T19:35:30Z

POD support team says they see no errors related to PUL submissions and the counts of records processed the POD UI displays is accurate based on the files we've actual submitted to POD.

kevinreiss · 2024-04-09T15:34:31Z

I will advise the POD we want to do a full refresh of the data by republishing the set in Alma publishing. Once we do that we'll re-publish the set to the stream in POD that they tell us to use. They may want them in a new stream to keep the size of our record data set down in POD.

kevinreiss · 2024-04-11T15:48:35Z

Per @rladdusaw running the process locally worked fine with five alma dump files from 2/23 (totaling 1075929 records) if you exclude the upload to POD. No Upload was recorded at all on the POD website for this date. Monitoring this the highest memory usage was 200MB. Potentially the issue relates to the POST we make to send the processed data.

kevinreiss · 2024-04-11T21:38:04Z

POD advised we should republish to a new stream: right the default is our current production stream:

lib_jobs/config/config.yml

Line 17 in 5199595

pod_default_stream: <%= ENV["POD_DEFAULT_STREAM"] || "princeton-prod-0223" %>

.

kevinreiss · 2024-04-11T21:39:30Z

We can redirect the republishing event to this stream: https://pod.stanford.edu/organizations/princeton/streams/princeton-prod-0424.

kevinreiss · 2024-04-15T13:43:04Z

@mzelesky please ping us when you think this data set is ready to be refreshed.

mzelesky · 2024-04-15T13:44:47Z

It probably makes sense to wait until we restore the records affected by the WorldShare process, since 7 million+ records will be modified by that restoration.

kevinreiss · 2024-04-15T13:54:36Z

Ok, when the plan for this becomes clearer let's plan for this alongside our blacklight and submit collection updates.

kevinreiss · 2024-04-26T14:25:21Z

Now that the DataSync updates have resumed again per @mzelesky as of 4/23/2024. We still continue to see discrepancies from POD publishing in Alma vs. records that the POD platform reports processed. See these two screenshots for 4/19/2024-4/26/2024.

For POD

For Alma Publishing

kevinreiss · 2024-04-26T14:26:00Z

None of the records processed in Alma are at all in line with what POD has processed for the same dates.

kevinreiss added bug 🐛 Something isn't working investigate labels Mar 15, 2024

rladdusaw self-assigned this Mar 28, 2024

kevinreiss mentioned this issue Jun 21, 2024

Get POD Data Up to Date #794

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Many POD Updates Missed from February #722

Many POD Updates Missed from February #722

kevinreiss commented Mar 15, 2024 •

edited

Loading

kevinreiss commented Apr 4, 2024

kevinreiss commented Apr 8, 2024

kevinreiss commented Apr 9, 2024 •

edited

Loading

kevinreiss commented Apr 11, 2024 •

edited

Loading

kevinreiss commented Apr 11, 2024

kevinreiss commented Apr 11, 2024

kevinreiss commented Apr 15, 2024

mzelesky commented Apr 15, 2024

kevinreiss commented Apr 15, 2024

kevinreiss commented Apr 26, 2024

kevinreiss commented Apr 26, 2024

Many POD Updates Missed from February #722

Many POD Updates Missed from February #722

Comments

kevinreiss commented Mar 15, 2024 • edited Loading

Expected behavior

Actual behavior

Steps to replicate

Impact of this bug

Honeybadger link and code snippet, if applicable

Implementation notes, if any

Acceptance Criteria

kevinreiss commented Apr 4, 2024

kevinreiss commented Apr 8, 2024

kevinreiss commented Apr 9, 2024 • edited Loading

kevinreiss commented Apr 11, 2024 • edited Loading

kevinreiss commented Apr 11, 2024

kevinreiss commented Apr 11, 2024

kevinreiss commented Apr 15, 2024

mzelesky commented Apr 15, 2024

kevinreiss commented Apr 15, 2024

kevinreiss commented Apr 26, 2024

kevinreiss commented Apr 26, 2024

kevinreiss commented Mar 15, 2024 •

edited

Loading

kevinreiss commented Apr 9, 2024 •

edited

Loading

kevinreiss commented Apr 11, 2024 •

edited

Loading