You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Nuxeo API bug: harvesting fetch counts for a large Nuxeo collection fluctuates with harvesting runs - possibly an issue with deeply nested objects - revisit after the Nuxeo DB query work is completed
#1166
Closed
christinklez opened this issue
Jan 24, 2025
· 3 comments
Registry ID: 26713 https://calisphere-stage.cdlib.org/collections/26713/
This is a Nuxeo API bug.
Perhaps ask Nuxeo if they can provide guidance to query the database directly? Also press Nuxeo to address the API bug.
Before attempting this harvest, we decided to test if there was an issue with harvesting "Deeply Nested Objects." We've encountered this issue before with generating Nuxeo extent stats, in which Deeply Nested Objects weren't fully getting picked up.
We worked with Elvia/UCI to move the folders up one level. (They were previously separated into Do Not Publish & Publish folders. Instead the contents in the Publish folder were moved up one level.) Here are the harvesting results, below.
I think 33758 parents is good — the script I ran to get the count previously didn’t include the 3 records that were in the top level folder! Previous count I got was 33755.
Related: https://github.com/orgs/ucldc/projects/2/views/2?pane=issue&itemId=84872751&issue=ucldc%7Cnuxeo_merritt%7C12
==
Registry ID: 26713
https://calisphere-stage.cdlib.org/collections/26713/
This is a Nuxeo API bug.
Perhaps ask Nuxeo if they can provide guidance to query the database directly? Also press Nuxeo to address the API bug.
Expected counts for 26713, according to the doclist (as of 2024-09-04):
https://docs.google.com/spreadsheets/d/1_atOF_NRSNGFBktgecZoU3Hkiz-IsFQP/edit?gid=129213630#gid=129213630
==
Harvest attempt #1
Run ID:
manual__2024-09-13T16:33:18+00:00
(this is the failed one, that never finished)Fetch log: https://7a8067cb-3b99-477e-a883-7e311175a9b4.c3.us-west-2.airflow.amazonaws.com/dags/harvest_collection/grid?dag_run_id=manual__2024-09-13T16%3A33%3A18%2B00%3A00&tab=logs&task_id=fetching.fetch_collection
This fetch job took 53 minutes.
Note: This harvest job did not complete. (A new job was started instead, when there were some content_harvest errors.)
Harvest attempt #2
Run ID:
manual__2024-09-13T16:33:18+00:00
(this one was successful)Fetch log: https://7a8067cb-3b99-477e-a883-7e311175a9b4.c3.us-west-2.airflow.amazonaws.com/dags/harvest_collection/grid?dag_run_id=manual__2024-09-16T15%3A56%3A49%2B00%3A00&task_id=fetching.fetch_collection&tab=logs
This fetch job took 43 minutes.
Note: This harvest job did complete.
-stage counts from this job: 29,433
-prod counts (currently published): 30,720
Because of the drop in counts, UCI alerted us of this discrepancy. We started a new harvest job.
Harvest attempt #3
Run ID:
manual__2024-09-19T16:52:39+00:00
Fetch log: https://7a8067cb-3b99-477e-a883-7e311175a9b4.c3.us-west-2.airflow.amazonaws.com/dags/harvest_collection/grid?dag_run_id=manual__2024-09-19T16%3A52%3A39%2B00%3A00&task_id=fetching.fetch_collection&tab=logs
This fetch job took 42 minutes.
Note: This harvest job did complete.
-stage counts from this job: 25,763
-prod counts (currently published): 30,720
Harvest attempt #4
Before attempting this harvest, we decided to test if there was an issue with harvesting "Deeply Nested Objects." We've encountered this issue before with generating Nuxeo extent stats, in which Deeply Nested Objects weren't fully getting picked up.
We worked with Elvia/UCI to move the folders up one level. (They were previously separated into Do Not Publish & Publish folders. Instead the contents in the Publish folder were moved up one level.) Here are the harvesting results, below.
Run ID:
manual__2024-09-24T18:32:22+00:00
Fetch log: https://7a8067cb-3b99-477e-a883-7e311175a9b4.c3.us-west-2.airflow.amazonaws.com/dags/harvest_collection/grid?dag_run_id=manual__2024-09-24T18%3A32%3A22%2B00%3A00&task_id=fetching.fetch_collection&tab=logs
This fetch job took 6 hours 1 minute.
Note: This harvest job did complete.
-stage counts from this job: 32,704
-prod counts (currently published): 30,720
Harvest attempt #5
Run ID:
manual__2024-10-16T19:35:54+00:00
Fetch log: https://7a8067cb-3b99-477e-a883-7e311175a9b4.c3.us-west-2.airflow.amazonaws.com/dags/harvest_collection/grid?dag_run_id=manual__2024-10-16T19%3A35%3A54%2B00%3A00&task_id=fetching.fetch_collection&tab=logs
Harvest attempt #6
Run ID: manual__2024-10-21T06%3A36%3
Fetch log: https://7a8067cb-3b99-477e-a883-7e311175a9b4.c3.us-west-2.airflow.amazonaws.com/dags/harvest_collection/grid?base_date=2024-10-22T19%3A35%3A54Z&dag_run_id=manual__2024-10-21T06%3A36%3
Harvest attempt #7: 2025-01-16
Run ID:
manual__2025-01-16T20:59:15+00:00
Fetch log: https://7a8067cb-3b99-477e-a883-7e311175a9b4.c3.us-west-2.airflow.amazonaws.com/dags/harvest_collection/grid?dag_run_id=manual__2025-01-16T20%3A59%3A15%2B00%3A00&task_id=fetching.fetch_collection&tab=logs
Note: This harvest job did complete.
-stage counts from this job: 27,439
The text was updated successfully, but these errors were encountered: