Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in likelihood-ratio test p-value calculation for Lmer models #134

Open
paxtonfitzpatrick opened this issue Feb 7, 2024 · 1 comment
Labels

Comments

@paxtonfitzpatrick
Copy link

Hi, and thanks for maintaining such a fantastic package!

I removed most of the issue template since I think this is a pretty straightforward bug and not related to my environment/system/specific data/etc., but if you'd like a more complete report with that info please let me know!

When computing a likelihood-ratio test between 2 (or more) Lmer models, pymer4.stats.lrt() computes the degrees of freedom shown in the output table based on the difference in the number of parameters between the models:

df = _get_params(m) - (_get_params(models_list[i - 1])) if i > 0 else np.nan

But then to compute the p-value for the test, pymer4.utils._lrt() re-computes the degrees of freedom based on the number of rows in the two models' .coefs attributes (pandas DataFrames):

return chi2.sf(d, np.abs(tup[0].coefs.shape[0] - tup[1].coefs.shape[0]))

which, for an Lmer model, includes only the fixed effects. So if the random effects differ between the two models, the p-value will be wrong.

I think this can pretty easily be fixed by using pymer4.utils._get_params(mod) instead of mod.coefs.shape[0] in pymer4.utils._lrt(). E.g., to ensure the function still works with Lm and Lm2 model classes, something like:

def _lrt(tup):
    """Likelihood ratio test between 2 models. Used by stats.lrt"""
    from .models import Lmer    # local import to avoid circular dependency
    d = np.abs(2 * (tup[0].logLike - tup[1].logLike))
    n_params_mod1 = _get_params(tup[0]) if isinstance(tup[0], Lmer) else tup[0].coefs.shape[0]
    n_params_mod2 = _get_params(tup[1]) if isinstance(tup[1], Lmer) else tup[1].coefs.shape[0]
    return chi2.sf(d, np.abs(n_params_mod1 - n_params_mod2))

Or since the test statistic and df are both already computed in the outer pymer4.stats.lrt() function, it might be simpler to just move the p-value calculation there and remove this helper function entirely.

I've confirmed that the p-value computed this way (i.e., using _get_params()) matches the p-value given by anova(mod1, mod2) in R, as do all other values in the DataFrame returned by pymer4.stats.lrt().

Thanks again!

@ejolly
Copy link
Owner

ejolly commented Jul 18, 2024

Thanks for bringing this up and for the suggested fix @paxtonfitzpatrick !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants