Skip to content

Creates a lookup table for disambiguation of country names, using Wikipedia data

Notifications You must be signed in to change notification settings

aaronschiff/country-names

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Country disambiguation

These scripts assist with the construction of a list to disambiguate the names of countries. The list of ISO 3166 countries from Wikipedia is used as the list of "standard" country names (https://en.wikipedia.org/wiki/ISO_3166).

The script get3166.py scrapes the list of ISO 3166 country names and other information from Wikipedia and saves it in countries.csv.

The script disambiguate.py generates a lookup table of alternative names for each of the ISO 3166 country names. It does this by parsing the list of transitive redirects on Wikipedia constructed by the DBpedia project.

The end result is a CSV file with two columns: alternative country names, and standardised (ISO 3166) names.

Some alternative country names that are not in fact country names will be included due to the way that Wikipedia redirects are stored in dbpedia, but this shouldn't matter much aside from making the lookup table a bit bigger than necessary.

The code is quite fragile and slow, but it should not be necessary to run it frequently.

Thanks to Alyona for suggesting this approach to me.

About

Creates a lookup table for disambiguation of country names, using Wikipedia data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages