You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, and thanks for maintaining such a fantastic package!
I removed most of the issue template since I think this is a pretty straightforward bug and not related to my environment/system/specific data/etc., but if you'd like a more complete report with that info please let me know!
When computing a likelihood-ratio test between 2 (or more) Lmer models, pymer4.stats.lrt() computes the degrees of freedom shown in the output table based on the difference in the number of parameters between the models:
But then to compute the p-value for the test, pymer4.utils._lrt() re-computes the degrees of freedom based on the number of rows in the two models' .coefs attributes (pandas DataFrames):
which, for an Lmer model, includes only the fixed effects. So if the random effects differ between the two models, the p-value will be wrong.
I think this can pretty easily be fixed by using pymer4.utils._get_params(mod) instead of mod.coefs.shape[0] in pymer4.utils._lrt(). E.g., to ensure the function still works with Lm and Lm2 model classes, something like:
def_lrt(tup):
"""Likelihood ratio test between 2 models. Used by stats.lrt"""from .modelsimportLmer# local import to avoid circular dependencyd=np.abs(2* (tup[0].logLike-tup[1].logLike))
n_params_mod1=_get_params(tup[0]) ifisinstance(tup[0], Lmer) elsetup[0].coefs.shape[0]
n_params_mod2=_get_params(tup[1]) ifisinstance(tup[1], Lmer) elsetup[1].coefs.shape[0]
returnchi2.sf(d, np.abs(n_params_mod1-n_params_mod2))
Or since the test statistic and df are both already computed in the outer pymer4.stats.lrt() function, it might be simpler to just move the p-value calculation there and remove this helper function entirely.
I've confirmed that the p-value computed this way (i.e., using _get_params()) matches the p-value given by anova(mod1, mod2) in R, as do all other values in the DataFrame returned by pymer4.stats.lrt().
Thanks again!
The text was updated successfully, but these errors were encountered:
Hi, and thanks for maintaining such a fantastic package!
I removed most of the issue template since I think this is a pretty straightforward bug and not related to my environment/system/specific data/etc., but if you'd like a more complete report with that info please let me know!
When computing a likelihood-ratio test between 2 (or more)
Lmer
models,pymer4.stats.lrt()
computes the degrees of freedom shown in the output table based on the difference in the number of parameters between the models:pymer4/pymer4/stats.py
Line 592 in 4605b46
But then to compute the p-value for the test,
pymer4.utils._lrt()
re-computes the degrees of freedom based on the number of rows in the two models'.coefs
attributes (pandas DataFrames):pymer4/pymer4/utils.py
Line 635 in 4605b46
which, for an
Lmer
model, includes only the fixed effects. So if the random effects differ between the two models, the p-value will be wrong.I think this can pretty easily be fixed by using
pymer4.utils._get_params(mod)
instead ofmod.coefs.shape[0]
inpymer4.utils._lrt()
. E.g., to ensure the function still works withLm
andLm2
model classes, something like:Or since the test statistic and df are both already computed in the outer
pymer4.stats.lrt()
function, it might be simpler to just move the p-value calculation there and remove this helper function entirely.I've confirmed that the p-value computed this way (i.e., using
_get_params()
) matches the p-value given byanova(mod1, mod2)
in R, as do all other values in the DataFrame returned bypymer4.stats.lrt()
.Thanks again!
The text was updated successfully, but these errors were encountered: