These scripts assist with the construction of a list to disambiguate the names of countries. The list of ISO 3166 countries from Wikipedia is used as the list of "standard" country names (https://en.wikipedia.org/wiki/ISO_3166).
The script get3166.py
scrapes the list of ISO 3166 country names and other information from Wikipedia and saves it in countries.csv.
The script disambiguate.py
generates a lookup table of alternative names for each of the ISO 3166 country names. It does this by parsing the list of transitive redirects on Wikipedia constructed by the DBpedia project.
The end result is a CSV file with two columns: alternative country names, and standardised (ISO 3166) names.
Some alternative country names that are not in fact country names will be included due to the way that Wikipedia redirects are stored in dbpedia, but this shouldn't matter much aside from making the lookup table a bit bigger than necessary.
The code is quite fragile and slow, but it should not be necessary to run it frequently.
Thanks to Alyona for suggesting this approach to me.