-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
population averaged vs. subject specific predictions for merMod #215
Comments
Do you have a reproducible example? In the following example, it works as intended.
This terminology is from the glmmTMB-documentation. This may be related to #137. library(ggeffects)
library(lme4)
#> Loading required package: Matrix
set.seed(123)
dat <- data.frame(
outcome = rbinom(n = 100, size = 1, prob = 0.35),
var_binom = as.factor(rbinom(n = 100, size = 1, prob = 0.2)),
var_cont = rnorm(n = 100, mean = 10, sd = 7),
group = sample(letters[1:4], size = 100, replace = TRUE)
)
dat$var_cont <- sjmisc::std(dat$var_cont)
m1 <- glmer(
outcome ~ var_binom + var_cont + (1 | group),
data = dat,
family = binomial(link = "logit")
)
ggpredict(m1, "var_cont") |> plot()
#> Data were 'prettified'. Consider using `terms="var_cont [all]"` to get smooth plots.
#> Loading required namespace: ggplot2 ggpredict(m1, "var_cont", type = "random") |> plot()
#> Data were 'prettified'. Consider using `terms="var_cont [all]"` to get smooth plots. Created on 2021-05-26 by the reprex package (v2.0.0) |
Hi Daniel, Thanks for your reply! I immediately looked at your code instead of reading your remark about glmmTMB, which I only saw after making the reproducible example below. Still I think this way of saying it is a bit confusing in case of logistic and other nonlinear models, where averaging of the random effect leads to another prediction than setting the random effect to zero. Since I now made the code to show this, I may as well add it here. Thank again for your explanation, Ben. Here is an example with so called "population averaged predictions". This wording is typically used for gee models, but also for mixed models in case the predictions are made by averaging over all possible (normally distributed) random-intercept values. I adapted your code and added comments. At the end the population averaged predictions shown alongside those of the mixed glmer model. What was confusing for me was the "on the population level" remark in the ggpredict manual, and I now, after reading your explanation about glmmTMB see what it means. However, in the gee wording, what ggpredict actually predicts are the "subject specific" predictions, i.e. for the average individual, which has value zero for the random intercept.
|
for reference: easystats/modelbased#57 |
Not sure if this would be any help, but here is code I used to calculate population averaged values out of
It is fairly straight forward with a single, random intercept only, but becomes harder with random intercepts & slopes or multiple random effects. |
I think this vignette should clarify the available options: |
Hi Daniel,
Thanks for your nice work first! I used to manually generate plots with logistic-regression-predicted p-values a long time and only recently discovered your package, Great for my students, and much less programming in R to explain for me.
I have a question about ggpredict for a random intercept logistic regression model (I posted the question an Crossvalidated a week ago or so, but no response). For such model, ggpredict offers two possibilities, type="fixed" and type="random". About the default type="fixed", the manual says:
Predicted values are conditioned on the fixed effects or conditional model only (for mixed models: predicted values are on the population-level and confidence intervals are returned).
My question is about the last part of the above sentence, between the brackets: "on the population-level". This term is often used in the framework of "population averaged predictions", which are typically produced by gee-models. I thought this was meant with your description. Using gee (with an exchangeable correlation structure) to estimate my logistic model, leads to different predicted probabilities than those obtained from ggpredict. Furthermore, the predicted probabilities obtained with ggpredict are highly similar (up to 4 or 5 decimals) to those calculated "by hand" (based on results of glmer in R), for a respondent having an average intercept. So the predicted probabilities from ggpredict seem to be subject-specific rather than "on the population level". Could you explain what ggpredict actually predicts, in case of mixed-models?
Thanks for any help!
Ben Pelzer.
The text was updated successfully, but these errors were encountered: