Bug in likelihood-ratio test p-value calculation for Lmer models #134

paxtonfitzpatrick · 2024-02-07T21:24:15Z

Hi, and thanks for maintaining such a fantastic package!

I removed most of the issue template since I think this is a pretty straightforward bug and not related to my environment/system/specific data/etc., but if you'd like a more complete report with that info please let me know!

When computing a likelihood-ratio test between 2 (or more) Lmer models, pymer4.stats.lrt() computes the degrees of freedom shown in the output table based on the difference in the number of parameters between the models:

pymer4/pymer4/stats.py

Line 592 in 4605b46

df = _get_params(m) - (_get_params(models_list[i - 1])) if i > 0 else np.nan

But then to compute the p-value for the test, pymer4.utils._lrt() re-computes the degrees of freedom based on the number of rows in the two models' .coefs attributes (pandas DataFrames):

pymer4/pymer4/utils.py

Line 635 in 4605b46

return chi2.sf(d, np.abs(tup[0].coefs.shape[0] - tup[1].coefs.shape[0]))

which, for an Lmer model, includes only the fixed effects. So if the random effects differ between the two models, the p-value will be wrong.

I think this can pretty easily be fixed by using pymer4.utils._get_params(mod) instead of mod.coefs.shape[0] in pymer4.utils._lrt(). E.g., to ensure the function still works with Lm and Lm2 model classes, something like:

def _lrt(tup):
    """Likelihood ratio test between 2 models. Used by stats.lrt"""
    from .models import Lmer    # local import to avoid circular dependency
    d = np.abs(2 * (tup[0].logLike - tup[1].logLike))
    n_params_mod1 = _get_params(tup[0]) if isinstance(tup[0], Lmer) else tup[0].coefs.shape[0]
    n_params_mod2 = _get_params(tup[1]) if isinstance(tup[1], Lmer) else tup[1].coefs.shape[0]
    return chi2.sf(d, np.abs(n_params_mod1 - n_params_mod2))

Or since the test statistic and df are both already computed in the outer pymer4.stats.lrt() function, it might be simpler to just move the p-value calculation there and remove this helper function entirely.

I've confirmed that the p-value computed this way (i.e., using _get_params()) matches the p-value given by anova(mod1, mod2) in R, as do all other values in the DataFrame returned by pymer4.stats.lrt().

Thanks again!

The text was updated successfully, but these errors were encountered:

ejolly · 2024-07-18T18:53:11Z

Thanks for bringing this up and for the suggested fix @paxtonfitzpatrick !

paxtonfitzpatrick added the bug label Feb 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in likelihood-ratio test p-value calculation for Lmer models #134

Bug in likelihood-ratio test p-value calculation for Lmer models #134

paxtonfitzpatrick commented Feb 7, 2024

ejolly commented Jul 18, 2024

Bug in likelihood-ratio test p-value calculation for Lmer models #134

Bug in likelihood-ratio test p-value calculation for Lmer models #134

Comments

paxtonfitzpatrick commented Feb 7, 2024

ejolly commented Jul 18, 2024