Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Jupyter Notebook to compare NER/WoRMS with GBIF #74

Open
amandawhitmire opened this issue Mar 26, 2021 · 0 comments
Open

Create Jupyter Notebook to compare NER/WoRMS with GBIF #74

amandawhitmire opened this issue Mar 26, 2021 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@amandawhitmire
Copy link
Member

Our current pipeline for identifying species occurrences is to use the WoRMS taxonomy as an entity set for NER analysis of the student papers. GBIF offers a tool to check species names (https://www.gbif.org/tools/species-lookup). Could we use something like this to improve our accuracy in identifying species in our text?

A first step might be to compare the initial output of our NER process, a list of genus and species names, with the species that exist in GBIF. Is everything we found via WoRMS in GBIF? If so, is there a different pipeline for pulling species names from papers via GBIF that would be more accurate? Consider scenarios when the name is misspelled or the OCR botched a few letters. GBIF offers from fuzzy logic in string matching.

@amandawhitmire amandawhitmire added the enhancement New feature or request label Mar 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants