Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Web of Science harvest #204

Merged
merged 1 commit into from
Mar 6, 2025
Merged

Adding Web of Science harvest #204

merged 1 commit into from
Mar 6, 2025

Conversation

edsu
Copy link
Contributor

@edsu edsu commented Mar 3, 2025

This adds a harvester for Web of Science publications by ORCID. I've tested with AIRFLOW_VAR_DEV_LIMIT=50000 which helped discover some unexpected responses from the WoS API that I needed to handle see #207 and #208.

The API call was determined by looking at sul_pub's querying by ORCID and how the API KEY is passed.

Closes #165

@edsu edsu force-pushed the wos-harvest branch 7 times, most recently from 1062202 to becde5b Compare March 4, 2025 20:37
@edsu edsu marked this pull request as ready for review March 4, 2025 20:37
@edsu edsu force-pushed the wos-harvest branch 6 times, most recently from b3b97ca to 3620115 Compare March 5, 2025 22:00
@@ -28,7 +28,8 @@ dependencies = [
[tool.pytest.ini_options]
pythonpath = ["."]
markers = "mais_tests: Tests requiring MAIS access"
addopts = "-v --cov --cov-report=html --cov-report=term"
addopts = "-v --cov --cov-report=html --cov-report=term --log-level INFO --log-file test.log"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was useful for me to see the log messages. But I can remove if this is noisy.

@@ -150,6 +159,9 @@ def publish(pubs_to_contribs, merge_publications):

openalex_jsonl = openalex_harvest(snapshot)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually we'll want to tie the WoS harvesting into the workflow, but for now nothing is dependent on it.



def check_status(resp):
# see https://github.com/sul-dlss/rialto-airflow/issues/208
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the links to these issues

Copy link
Contributor

@lwrubel lwrubel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved pending minor comments

@edsu edsu merged commit 7966313 into main Mar 6, 2025
3 checks passed
@edsu edsu deleted the wos-harvest branch March 6, 2025 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Write harvest_wos task
2 participants