MucLex is a German lexicon for surface realisation. Its content is extracted from the German Wiktionary. The lexicon contains more than 100,000 lemmata and more than 670,000 different word forms in a well-structured XML file and is available under the Creative Commons BY-SA 3.0 license. Its format is compatible with SimpleNLG-DE.
The MucLex parser is written in Python and generated the lexicon from an XML dump of Wiktionary. The script is licensed under the Mozilla Public License (MPL).
@InProceedings{klimt-EtAl:2020:LREC,
author = {Klimt, Kira and Braun, Daniel and Schneider, Daniela and Matthes, Florian},
title = {MucLex: A German Lexicon for Surface Realisation},
booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference},
month = {May},
year = {2020},
address = {Marseille, France},
publisher = {European Language Resources Association},
pages = {4655--4659},
url = {https://www.aclweb.org/anthology/2020.lrec-1.572}
}
If you have any questions, please contact:
Daniel Braun (Technical University of Munich) [email protected]