-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Many POD Updates Missed from February #722
Comments
Wrote the POD support team to see if they have logged any errors related to PUL submissions in the last several months, since we are seeing almost daily discrepancies btw Alma publishing on what POD logs. |
POD support team says they see no errors related to PUL submissions and the counts of records processed the POD UI displays is accurate based on the files we've actual submitted to POD. |
I will advise the POD we want to do a full refresh of the data by republishing the set in Alma publishing. Once we do that we'll re-publish the set to the stream in POD that they tell us to use. They may want them in a new stream to keep the size of our record data set down in POD. |
Per @rladdusaw running the process locally worked fine with five alma dump files from 2/23 (totaling 1075929 records) if you exclude the upload to POD. No Upload was recorded at all on the POD website for this date. Monitoring this the highest memory usage was 200MB. Potentially the issue relates to the POST we make to send the processed data. |
POD advised we should republish to a new stream: right the default is our current production stream: Line 17 in 5199595
|
We can redirect the republishing event to this stream: https://pod.stanford.edu/organizations/princeton/streams/princeton-prod-0424. |
@mzelesky please ping us when you think this data set is ready to be refreshed. |
It probably makes sense to wait until we restore the records affected by the WorldShare process, since 7 million+ records will be modified by that restoration. |
Ok, when the plan for this becomes clearer let's plan for this alongside our blacklight and submit collection updates. |
Now that the DataSync updates have resumed again per @mzelesky as of 4/23/2024. We still continue to see discrepancies from POD publishing in Alma vs. records that the POD platform reports processed. See these two screenshots for 4/19/2024-4/26/2024. |
None of the records processed in Alma are at all in line with what POD has processed for the same dates. |
Expected behavior
When updates happen in Alma to records matching our POD publishing profile these updates should be sent to POD aggregator.
Actual behavior
See https://pod.stanford.edu/organizations/princeton for the list of datafiles accepted by POD. You can see the volume of records processed in Feb. is much, much lower than the amount of data written out by the Daily Alma Publisning Job when you review the Publishing Jobs Log in Alma (which shows that many millions of updates passed through the POD process) and when you look at the output files on lib-sftp in
/alma/pod
you can see many > 100 MB files produced throughout February.Steps to replicate
This issue requires investigation. We likley have to try and re-process updates from a day during this period with mass updates and observe the results. No exceptions related to this process appear to have been logged.
Impact of this bug
We have out of date data available to our Resource Sharing partners in Borrow Direct. The updates missed likely represent very few new records since the volume here was caused by mass record clean-up work by CaMS.
Honeybadger link and code snippet, if applicable
This may be another version of the out of memory issue #695.
Implementation notes, if any
See https://pod.stanford.edu/organizations/princeton for the list of datafiles accepted by POD. You can see the volume of records processed in Feb. is much, much lower than the amount of data written out by the Daily Alma Publisning Job. Interestingly if you look in the history of Princeton's submissions to POD you can see a period 8/1/2023-8/15/2023 where on some days we submitted files successfully with many hundreds of thousands of record updates so we've seen a period where a somewhat comparable volume of data passed successfully through our process. The data in February is greater in volume than August so perhaps maybe we reached a tipping point of some sort.
Acceptance Criteria
The text was updated successfully, but these errors were encountered: