Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Titles longer than 250 characters will always fail QID reconciliation #98

Open
diegodlh opened this issue May 30, 2021 · 4 comments
Open
Labels

Comments

@diegodlh
Copy link
Owner

Some titles may be too long for Wikidata item labels (#97).

In these cases, using the Wikidata reconciliation servics with the title as query would fail (would not return a QID), because the Wikidata item label would be a shorter version.

@lightgivener asked if P1476 (title) could be searched as well. I tried adding {pid: "P1476", v: item.title} to the queryProps array, but this would only be used to match against candidates already retrieved by label.

An alternative related to #97 would be to use the Short Title field as query, if available.

Possibly related to #84 as well.

@diegodlh
Copy link
Owner Author

diegodlh commented Jun 1, 2021

@lightgivener asked if P1476 (title) could be searched as well

Actually, P1476 should be searched already, because MediaWiki API's action=query&list=search (which the reconciliation service already uses) should search page content. However, P1476 seems to be ignored. I posted about this here.

@diegodlh
Copy link
Owner Author

diegodlh commented Jun 1, 2021

Posted an issue to the openrefine-wikibase repo: wetneb/openrefine-wikibase#116

@diegodlh
Copy link
Owner Author

diegodlh commented Jun 3, 2021

As a workaround, to minimize the chances of getting an unexpected empty results array from the reconciliation service, consider refusing to reconcile items with a title longer than 250 characters, which don't have an alternative short title.

@diegodlh
Copy link
Owner Author

DOI 10.1145/1718918.1718942 (QID Q66711639) is a related example. Both if added through the Zotero Connector or using the DOI, Zotero saves the title as "Readers are not free-riders: reading as a form of participation on wikipedia". But the Crossref API says the title is "Readers are not free-riders", whereas the rest is the "subtitle". The Wikidata item's label and title are "Readers are not free-riders".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant