You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the XMLLexicon is created, I would expect that the variants which I have configured will be indexed and the words can be found by createWord through their variants, which is not always the case.
I have found, that the dative_sin, genitive_sin and akkusative_sin are not correctly indexed for some words, and that it is a coincidence that the indexing works in most cases just because of the good heuristics that were defined in MorphologyRules.
Please note that I am using Bandes instead of Bands, which is the same word in a more sophisticated form.
During the creation of the XMLLexicon, SimpleNLG-DE will go through all variants of each word and put them into the index, but instead of choosing the genitive_sin that should be used for DiscourseFunction.GENITIVE + NumberAgreement.SINGULAR, the heuristics are used, generating Bands. So instead of Bandes, we get the index for Bands.
I think there are several issues contributing to that:
The getVariants method creates an InflectedWordElement
the genus is used as a condition (one of many) to use heuristics or not, but it will be always null so the genitive_sin would be ignored already.
But even if the genus was properly set, the genitive_sin that was specified in the lexicon would be ignored, because here
the element is the InflectedWordElement, which also does not have any of the variants from the lexicon that were set on the base word.
I have not analysed the impact of this on other components of the library, but I think that if the word is created using the base form Band and then changed to DiscourseFunction.GENITIVE and NumberAgreement.Singular, it should be correctly realised as Bandes because the variant would be taken from the base word.
My suggestions are to get the genus from the baseWord instead of the InflectedWordElement in doNounMorphology and pass the baseWord instead of the element to the doNounMorphologySingular.
I have tested these suggestions in my own environment and had it working with all the tests passing, but then again, I don't know if something else using doNounMorphology could break as I did not analyse that.
The text was updated successfully, but these errors were encountered:
Thanks for the report. I will have to look into this in more detail, but I can already tell that the problem seems to be a bit different.
"Bandes" is the standard form for genitive singular that is also used in the default lexicon. The following code produces the output "Ich sehe die Farbe des Bandes".
(There is nevertheless something wrong with loading the information for the inflected forms.)
The reason why you get the form "Bands" instead of "Bandes" with your custom lexicon entry seems to be that the lexicon entry is incomplete. I will have to double check but it looks like SimpleNLG-DE is defaulting back to rules if a noun entry only contains an entry for one case because the lexicon entry is considered incomplete. (Whether that makes sense would be the next question.)
When the XMLLexicon is created, I would expect that the variants which I have configured will be indexed and the words can be found by
createWord
through their variants, which is not always the case.I have found, that the
dative_sin
,genitive_sin
andakkusative_sin
are not correctly indexed for some words, and that it is a coincidence that the indexing works in most cases just because of the good heuristics that were defined inMorphologyRules
.Here is an example:
Please note that I am using
Bandes
instead ofBands
, which is the same word in a more sophisticated form.During the creation of the XMLLexicon, SimpleNLG-DE will go through all variants of each word and put them into the index, but instead of choosing the
genitive_sin
that should be used forDiscourseFunction.GENITIVE
+NumberAgreement.SINGULAR
, the heuristics are used, generatingBands
. So instead ofBandes
, we get the index forBands
.I think there are several issues contributing to that:
The
getVariants
method creates anInflectedWordElement
SimpleNLG-DE/src/main/java/simplenlgde/lexicon/XMLLexicon.java
Line 251 in 5c831cb
which does not receive the genus property from the base word.
https://github.com/sebischair/SimpleNLG-DE/blob/5c831cb9722406c749bc00bdd867e4d694e4bb4a/src/main/java/simplenlgde/framework/InflectedWordElement.java#L65C5-L73C6
Then that inflected word is used in
SimpleNLG-DE/src/main/java/simplenlgde/morphology/MorphologyRules.java
Line 70 in 5c831cb
where the genus will always be
null
.Later in
SimpleNLG-DE/src/main/java/simplenlgde/morphology/MorphologyRules.java
Line 184 in 5c831cb
the genus is used as a condition (one of many) to use heuristics or not, but it will be always null so the
genitive_sin
would be ignored already.But even if the genus was properly set, the
genitive_sin
that was specified in the lexicon would be ignored, because hereSimpleNLG-DE/src/main/java/simplenlgde/morphology/MorphologyRules.java
Line 208 in 5c831cb
the element is the
InflectedWordElement
, which also does not have any of the variants from the lexicon that were set on the base word.I have not analysed the impact of this on other components of the library, but I think that if the word is created using the base form
Band
and then changed toDiscourseFunction.GENITIVE
andNumberAgreement.Singular
, it should be correctly realised asBandes
because the variant would be taken from the base word.My suggestions are to get the
genus
from thebaseWord
instead of theInflectedWordElement
indoNounMorphology
and pass thebaseWord
instead of theelement
to thedoNounMorphologySingular
.I have tested these suggestions in my own environment and had it working with all the tests passing, but then again, I don't know if something else using
doNounMorphology
could break as I did not analyse that.The text was updated successfully, but these errors were encountered: