-
Notifications
You must be signed in to change notification settings - Fork 2
Bootstrap
odanoburu edited this page Nov 28, 2019
·
3 revisions
The text files in this repository, which are read by mill, can be bootstrapped from the WordNet RDF distributed at this link (PWN 3.0), or from RDF data generated from PWN’s database files using this tool.
You will need:
- Python 3 (+ libraries, see requirements file)
- (optional) Common Lisp, if you would like to generate the legacy RDF from the WNDB files yourself. This option is not documented (yet).
After obtaining the input RDF mentioned above, there are a couple
outstanding issues in it that do not allow a clean conversion to
mill
’s format:
-
mill
assumes every wordsense has unique identifier composed by its language/WN, its lexicographer file, its lexical form, and a lexical identifier. This is not true of adjective satellites in PWN, because they all share the same lexicographer file and the same lexical identifier of 0 (see related issue and its solution). This assumption allows to create simpler sense identifiers (versus the legacy sense keys which have special cases for adjective satellites). - this distribution has a few wrong URIs that end up pointing to inexisting nodes (see related issue and its solution).
To generate an initial version of the text files read by mill
use
the script at mill’s repository — it solves the issues above and then
performs the conversion. Run
python python/bootstrap-legacy-rdf.py --help
from the root of mill
’s repository for help on how to run it.