Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

population averaged vs. subject specific predictions for merMod #215

Closed
benpelz opened this issue May 17, 2021 · 5 comments
Closed

population averaged vs. subject specific predictions for merMod #215

benpelz opened this issue May 17, 2021 · 5 comments
Labels
3 investigators ❔❓ enhancement 💥 Implemented features can be improved or revised

Comments

@benpelz
Copy link

benpelz commented May 17, 2021

Hi Daniel,

Thanks for your nice work first! I used to manually generate plots with logistic-regression-predicted p-values a long time and only recently discovered your package, Great for my students, and much less programming in R to explain for me.

I have a question about ggpredict for a random intercept logistic regression model (I posted the question an Crossvalidated a week ago or so, but no response). For such model, ggpredict offers two possibilities, type="fixed" and type="random". About the default type="fixed", the manual says:

Predicted values are conditioned on the fixed effects or conditional model only (for mixed models: predicted values are on the population-level and confidence intervals are returned).

My question is about the last part of the above sentence, between the brackets: "on the population-level". This term is often used in the framework of "population averaged predictions", which are typically produced by gee-models. I thought this was meant with your description. Using gee (with an exchangeable correlation structure) to estimate my logistic model, leads to different predicted probabilities than those obtained from ggpredict. Furthermore, the predicted probabilities obtained with ggpredict are highly similar (up to 4 or 5 decimals) to those calculated "by hand" (based on results of glmer in R), for a respondent having an average intercept. So the predicted probabilities from ggpredict seem to be subject-specific rather than "on the population level". Could you explain what ggpredict actually predicts, in case of mixed-models?

Thanks for any help!

Ben Pelzer.

@strengejacke
Copy link
Owner

Do you have a reproducible example? In the following example, it works as intended.

"on the population-level"

This terminology is from the glmmTMB-documentation. This may be related to #137.

library(ggeffects)
library(lme4)
#> Loading required package: Matrix

set.seed(123)

dat <- data.frame(
  outcome = rbinom(n = 100, size = 1, prob = 0.35),
  var_binom = as.factor(rbinom(n = 100, size = 1, prob = 0.2)),
  var_cont = rnorm(n = 100, mean = 10, sd = 7),
  group = sample(letters[1:4], size = 100, replace = TRUE)
)

dat$var_cont <- sjmisc::std(dat$var_cont)

m1 <- glmer(
  outcome ~ var_binom + var_cont + (1 | group), 
  data = dat, 
  family = binomial(link = "logit")
)

ggpredict(m1, "var_cont") |> plot()
#> Data were 'prettified'. Consider using `terms="var_cont [all]"` to get smooth plots.
#> Loading required namespace: ggplot2

ggpredict(m1, "var_cont", type = "random") |> plot()
#> Data were 'prettified'. Consider using `terms="var_cont [all]"` to get smooth plots.

Created on 2021-05-26 by the reprex package (v2.0.0)

@strengejacke strengejacke added the reprex 💾 An example (and data) to reproduce the issue is needed to label May 26, 2021
@benpelz
Copy link
Author

benpelz commented May 31, 2021

Hi Daniel,

Thanks for your reply! I immediately looked at your code instead of reading your remark about glmmTMB, which I only saw after making the reproducible example below. Still I think this way of saying it is a bit confusing in case of logistic and other nonlinear models, where averaging of the random effect leads to another prediction than setting the random effect to zero. Since I now made the code to show this, I may as well add it here. Thank again for your explanation,

Ben.


Here is an example with so called "population averaged predictions". This wording is typically used for gee models, but also for mixed models in case the predictions are made by averaging over all possible (normally distributed) random-intercept values.

I adapted your code and added comments. At the end the population averaged predictions shown alongside those of the mixed glmer model. What was confusing for me was the "on the population level" remark in the ggpredict manual, and I now, after reading your explanation about glmmTMB see what it means. However, in the gee wording, what ggpredict actually predicts are the "subject specific" predictions, i.e. for the average individual, which has value zero for the random intercept.

library(ggeffects)
library(lme4)
#> Loading required package: Matrix

set.seed(123)

dat <- data.frame(
  outcome = rbinom(n = 100, size = 1, prob = 0.35),
  var_binom = as.factor(rbinom(n = 100, size = 1, prob = 0.2)),
  var_cont = rnorm(n = 100, mean = 10, sd = 7),
  group = sample(letters[1:4], size = 100, replace = TRUE)
)


dat$var_cont <- sjmisc::std(dat$var_cont)

m1 <- glmer(
  outcome ~ var_binom + var_cont + (1 | group), 
  data = dat, 
  family = binomial(link = "logit")
)

# Notice that the predictions of the fixed and random model are equal. 
predfixed <- ggpredict(m1, "var_cont") 
predrandom <- ggpredict(m1, "var_cont", type = "random") 
cbind(predfixed$predicted, predrandom$predicted)

> cbind(predfixed$predicted, predrandom$predicted)
           [,1]      [,2]
 [1,] 0.3367196 0.3367196
 [2,] 0.3430753 0.3430753
 [3,] 0.3494878 0.3494878
 [4,] 0.3559552 0.3559552
 [5,] 0.3624756 0.3624756
 [6,] 0.3690470 0.3690470
 [7,] 0.3756673 0.3756673
 [8,] 0.3823344 0.3823344
 [9,] 0.3890461 0.3890461
[10,] 0.3958002 0.3958002
[11,] 0.4025942 0.4025942
[12,] 0.4094258 0.4094258
[13,] 0.4162926 0.4162926


# Verify that predictions by ggpredict are based on fixed effects only.
xxx <- cbind(1,0,seq(-2.5, 3.5, by=0.5))
plogis(xxx %*% fixef(m1))

> plogis(xxx %*% fixef(m1))
           [,1]
 [1,] 0.3367196
 [2,] 0.3430753
 [3,] 0.3494878
 [4,] 0.3559552
 [5,] 0.3624756
 [6,] 0.3690470
 [7,] 0.3756673
 [8,] 0.3823344
 [9,] 0.3890461
[10,] 0.3958002
[11,] 0.4025942
[12,] 0.4094258
[13,] 0.4162926


# Initialize package for running gee. 
library(geepack)

# Sort data and convert the grouping variable into a factor, before running gee. 
dat <- dat[order(dat$group),]
dat$groupf <- as.factor(dat$group)

# Estimate the gee model.
geemodel <- geeglm(outcome ~ var_binom + var_cont,
                   family=binomial(link=logit), data=dat, id=groupf, 
                   corstr="exchangeable")

# Using ggpredict to obtain population averaged predictions from gee model m2. 
predgee <- ggpredict(geemodel, "var_cont")

# Verify how ggpredict works for the gee model.
xxx <- cbind(1,0,seq(-2.5, 3.5, by=0.5))
predxxx <- plogis(xxx %*% coef(geemodel))
cbind(predgee$predicted, predxxx)

> cbind(predgee$predicted, predxxx)
           [,1]      [,2]
 [1,] 0.3411360 0.3411360
 [2,] 0.3473059 0.3473059
 [3,] 0.3535275 0.3535275
 [4,] 0.3597991 0.3597991
 [5,] 0.3661190 0.3661190
 [6,] 0.3724852 0.3724852
 [7,] 0.3788961 0.3788961
 [8,] 0.3853494 0.3853494
 [9,] 0.3918434 0.3918434
[10,] 0.3983759 0.3983759
[11,] 0.4049447 0.4049447
[12,] 0.4115478 0.4115478
[13,] 0.4181829 0.4181829

# Show glmer and gee predictions alongside.
# Notice that the gee predictions are closer to 0.5 as should be.
cbind(predgee$predicted, predrandom$predicted)
 
> cbind(predgee$predicted, predrandom$predicted)
           [,1]      [,2]
 [1,] 0.3411360 0.3367196
 [2,] 0.3473059 0.3430753
 [3,] 0.3535275 0.3494878
 [4,] 0.3597991 0.3559552
 [5,] 0.3661190 0.3624756
 [6,] 0.3724852 0.3690470
 [7,] 0.3788961 0.3756673
 [8,] 0.3853494 0.3823344
 [9,] 0.3918434 0.3890461
[10,] 0.3983759 0.3958002
[11,] 0.4049447 0.4025942
[12,] 0.4115478 0.4094258
[13,] 0.4181829 0.4162926

@strengejacke strengejacke changed the title predictions for a mixed logistic model in glmer population averaged vs. subject specific predictions for merMod Feb 6, 2022
@strengejacke
Copy link
Owner

for reference: easystats/modelbased#57

@strengejacke strengejacke added 3 investigators ❔❓ enhancement 💥 Implemented features can be improved or revised and removed reprex 💾 An example (and data) to reproduce the issue is needed to labels Feb 6, 2022
@JWiley
Copy link

JWiley commented Feb 6, 2022

Not sure if this would be any help, but here is code I used to calculate population averaged values out of brms.

It is fairly straight forward with a single, random intercept only, but becomes harder with random intercepts & slopes or multiple random effects. brms has "blocks" for each random effect set, so I loop through those and add them all up and relied on monte carlo integration rather than any analytic solution to keep the code as simple as possible regardless of the random effect structure.

@strengejacke
Copy link
Owner

I think this vignette should clarify the available options:
https://strengejacke.github.io/ggeffects/articles/introduction_randomeffects.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 investigators ❔❓ enhancement 💥 Implemented features can be improved or revised
Projects
None yet
Development

No branches or pull requests

3 participants