Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparing https://about.lens.org/covid-19/ scholary works and CORD19 Dataset? #49

Open
motey opened this issue Apr 22, 2020 · 1 comment
Labels
Status: Suggested This issue is a suggestion for doing something new or different in CovidGraph Tag: Good First Issue Good for newcomers Tag: Help Wanted Extra attention is needed Type: Question This issue raises a question for discussion

Comments

@motey
Copy link
Member

motey commented Apr 22, 2020

At the moment we are using the CORD19 Dataset for importing scientific papers to the covidgraph.
https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge

I just stumbled over a similar dataset at https://about.lens.org/covid-19/ "Scholary Works"

It would be interesting to know if this dataset is of similar scope and better quality and can maybe replace the CORD19 dataset. As the CORD19 Dataset is of poor quality in many places.

Task: Create a comparison between the CORD19 and lens.org Scholary Works dataset:

  • Scope / Amount of articles?

  • Which dataset comes with more identifieng attributes like DOIs, PCMiD, PubMedIDs? In which dataset these are more consistently appearing on each article?

  • Which dataset has more relevant attributes (e.g. MeshTerms) ?

  • Which Dataset is better for distincting authors (e.g. brings ORCID for some authors, etc)?

Extra Task: Find a even better datasource :)

@motey motey added Tag: Good First Issue Good for newcomers Tag: Help Wanted Extra attention is needed Type: Question This issue raises a question for discussion labels Apr 22, 2020
@motey
Copy link
Member Author

motey commented Apr 22, 2020

Pedro Parraguez: we have use extensibly the Microsoft Academic Graph dataset that for the most part is what Lens uses for Scholarly works. It is indeed of quite good quality. You don´t get the full text but in return you get better metadata, less duplicates, etc. As far as I have seen the coverage of pre-prints is lower in Lens, though. Maybe DOI-level matching would allow having the best of both worlds (although gaps will exist).

@Jiros Jiros added the Status: Suggested This issue is a suggestion for doing something new or different in CovidGraph label Dec 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Suggested This issue is a suggestion for doing something new or different in CovidGraph Tag: Good First Issue Good for newcomers Tag: Help Wanted Extra attention is needed Type: Question This issue raises a question for discussion
Projects
None yet
Development

No branches or pull requests

2 participants