Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OUP: check that artid is harvested correctly #240

Open
pamfilos opened this issue Nov 8, 2023 · 1 comment
Open

OUP: check that artid is harvested correctly #240

pamfilos opened this issue Nov 8, 2023 · 1 comment
Assignees

Comments

@pamfilos
Copy link

pamfilos commented Nov 8, 2023

There are articles (e.g. article 10195 ). That has in the publication info page_number the article_id value

So in the XML we have:
<elocation-id>043C01</elocation-id>

But instead of mapping elocation-id to artid, we map it to the page number field

@ErnestaP
Copy link

ErnestaP commented Nov 15, 2023

Do we want to do something with this article?
The errors from the older articles are hard to trace. The path of parsing articles' values changed with time (sadly, sometimes I needed to adapt the code regarding changes), or the code itself was not written correctly.
This article was updated in 2018, so the issue regarding page_start had to be solved then.
There is no page_start in the current OUP parser: https://github.com/SCOAP3/hepcrawl/blob/master/hepcrawl/extractors/oup_parser.py

We can see the same problem also in another article, from the same harvesting period: https://repo.scoap3.org/records/10194
I believe there is no sense in understanding the logic of parsing before 2020 or even 2021. There were (and still are) so many bugs and mistakes in the code that are hard to catch!
It is crucial to have new (from 2021) articles parsed correctly.
https://repo.scoap3.org/records/60075 (2023)
https://repo.scoap3.org/records/75414 (2022)
https://repo.scoap3.org/records/68393 (2021)
https://repo.scoap3.org/records/68393 (2020)

Also, the publisher has kind of weird value: Oxford University Press/Physical Society of Japan, :
Screenshot 2023-11-15 at 12 16 30

Now we have a mapping for it: https://github.com/SCOAP3/scoap3-next/blob/master/scoap3/config.py#L568

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants