population averaged vs. subject specific predictions for merMod #215

benpelz · 2021-05-17T09:01:20Z

Hi Daniel,

Thanks for your nice work first! I used to manually generate plots with logistic-regression-predicted p-values a long time and only recently discovered your package, Great for my students, and much less programming in R to explain for me.

I have a question about ggpredict for a random intercept logistic regression model (I posted the question an Crossvalidated a week ago or so, but no response). For such model, ggpredict offers two possibilities, type="fixed" and type="random". About the default type="fixed", the manual says:

Predicted values are conditioned on the fixed effects or conditional model only (for mixed models: predicted values are on the population-level and confidence intervals are returned).

My question is about the last part of the above sentence, between the brackets: "on the population-level". This term is often used in the framework of "population averaged predictions", which are typically produced by gee-models. I thought this was meant with your description. Using gee (with an exchangeable correlation structure) to estimate my logistic model, leads to different predicted probabilities than those obtained from ggpredict. Furthermore, the predicted probabilities obtained with ggpredict are highly similar (up to 4 or 5 decimals) to those calculated "by hand" (based on results of glmer in R), for a respondent having an average intercept. So the predicted probabilities from ggpredict seem to be subject-specific rather than "on the population level". Could you explain what ggpredict actually predicts, in case of mixed-models?

Thanks for any help!

Ben Pelzer.

strengejacke · 2021-05-26T10:03:23Z

Do you have a reproducible example? In the following example, it works as intended.

"on the population-level"

This terminology is from the glmmTMB-documentation. This may be related to #137.

library(ggeffects)
library(lme4)
#> Loading required package: Matrix

set.seed(123)

dat <- data.frame(
  outcome = rbinom(n = 100, size = 1, prob = 0.35),
  var_binom = as.factor(rbinom(n = 100, size = 1, prob = 0.2)),
  var_cont = rnorm(n = 100, mean = 10, sd = 7),
  group = sample(letters[1:4], size = 100, replace = TRUE)
)

dat$var_cont <- sjmisc::std(dat$var_cont)

m1 <- glmer(
  outcome ~ var_binom + var_cont + (1 | group), 
  data = dat, 
  family = binomial(link = "logit")
)

ggpredict(m1, "var_cont") |> plot()
#> Data were 'prettified'. Consider using `terms="var_cont [all]"` to get smooth plots.
#> Loading required namespace: ggplot2

ggpredict(m1, "var_cont", type = "random") |> plot()
#> Data were 'prettified'. Consider using `terms="var_cont [all]"` to get smooth plots.

^{Created on 2021-05-26 by the reprex package (v2.0.0)}

benpelz · 2021-05-31T11:20:00Z

Hi Daniel,

Thanks for your reply! I immediately looked at your code instead of reading your remark about glmmTMB, which I only saw after making the reproducible example below. Still I think this way of saying it is a bit confusing in case of logistic and other nonlinear models, where averaging of the random effect leads to another prediction than setting the random effect to zero. Since I now made the code to show this, I may as well add it here. Thank again for your explanation,

Ben.

Here is an example with so called "population averaged predictions". This wording is typically used for gee models, but also for mixed models in case the predictions are made by averaging over all possible (normally distributed) random-intercept values.

I adapted your code and added comments. At the end the population averaged predictions shown alongside those of the mixed glmer model. What was confusing for me was the "on the population level" remark in the ggpredict manual, and I now, after reading your explanation about glmmTMB see what it means. However, in the gee wording, what ggpredict actually predicts are the "subject specific" predictions, i.e. for the average individual, which has value zero for the random intercept.

library(ggeffects)
library(lme4)
#> Loading required package: Matrix

set.seed(123)

dat <- data.frame(
  outcome = rbinom(n = 100, size = 1, prob = 0.35),
  var_binom = as.factor(rbinom(n = 100, size = 1, prob = 0.2)),
  var_cont = rnorm(n = 100, mean = 10, sd = 7),
  group = sample(letters[1:4], size = 100, replace = TRUE)
)


dat$var_cont <- sjmisc::std(dat$var_cont)

m1 <- glmer(
  outcome ~ var_binom + var_cont + (1 | group), 
  data = dat, 
  family = binomial(link = "logit")
)

# Notice that the predictions of the fixed and random model are equal. 
predfixed <- ggpredict(m1, "var_cont") 
predrandom <- ggpredict(m1, "var_cont", type = "random") 
cbind(predfixed$predicted, predrandom$predicted)

> cbind(predfixed$predicted, predrandom$predicted)
           [,1]      [,2]
 [1,] 0.3367196 0.3367196
 [2,] 0.3430753 0.3430753
 [3,] 0.3494878 0.3494878
 [4,] 0.3559552 0.3559552
 [5,] 0.3624756 0.3624756
 [6,] 0.3690470 0.3690470
 [7,] 0.3756673 0.3756673
 [8,] 0.3823344 0.3823344
 [9,] 0.3890461 0.3890461
[10,] 0.3958002 0.3958002
[11,] 0.4025942 0.4025942
[12,] 0.4094258 0.4094258
[13,] 0.4162926 0.4162926


# Verify that predictions by ggpredict are based on fixed effects only.
xxx <- cbind(1,0,seq(-2.5, 3.5, by=0.5))
plogis(xxx %*% fixef(m1))

> plogis(xxx %*% fixef(m1))
           [,1]
 [1,] 0.3367196
 [2,] 0.3430753
 [3,] 0.3494878
 [4,] 0.3559552
 [5,] 0.3624756
 [6,] 0.3690470
 [7,] 0.3756673
 [8,] 0.3823344
 [9,] 0.3890461
[10,] 0.3958002
[11,] 0.4025942
[12,] 0.4094258
[13,] 0.4162926


# Initialize package for running gee. 
library(geepack)

# Sort data and convert the grouping variable into a factor, before running gee. 
dat <- dat[order(dat$group),]
dat$groupf <- as.factor(dat$group)

# Estimate the gee model.
geemodel <- geeglm(outcome ~ var_binom + var_cont,
                   family=binomial(link=logit), data=dat, id=groupf, 
                   corstr="exchangeable")

# Using ggpredict to obtain population averaged predictions from gee model m2. 
predgee <- ggpredict(geemodel, "var_cont")

# Verify how ggpredict works for the gee model.
xxx <- cbind(1,0,seq(-2.5, 3.5, by=0.5))
predxxx <- plogis(xxx %*% coef(geemodel))
cbind(predgee$predicted, predxxx)

> cbind(predgee$predicted, predxxx)
           [,1]      [,2]
 [1,] 0.3411360 0.3411360
 [2,] 0.3473059 0.3473059
 [3,] 0.3535275 0.3535275
 [4,] 0.3597991 0.3597991
 [5,] 0.3661190 0.3661190
 [6,] 0.3724852 0.3724852
 [7,] 0.3788961 0.3788961
 [8,] 0.3853494 0.3853494
 [9,] 0.3918434 0.3918434
[10,] 0.3983759 0.3983759
[11,] 0.4049447 0.4049447
[12,] 0.4115478 0.4115478
[13,] 0.4181829 0.4181829

# Show glmer and gee predictions alongside.
# Notice that the gee predictions are closer to 0.5 as should be.
cbind(predgee$predicted, predrandom$predicted)
 
> cbind(predgee$predicted, predrandom$predicted)
           [,1]      [,2]
 [1,] 0.3411360 0.3367196
 [2,] 0.3473059 0.3430753
 [3,] 0.3535275 0.3494878
 [4,] 0.3597991 0.3559552
 [5,] 0.3661190 0.3624756
 [6,] 0.3724852 0.3690470
 [7,] 0.3788961 0.3756673
 [8,] 0.3853494 0.3823344
 [9,] 0.3918434 0.3890461
[10,] 0.3983759 0.3958002
[11,] 0.4049447 0.4025942
[12,] 0.4115478 0.4094258
[13,] 0.4181829 0.4162926

strengejacke · 2022-02-06T10:55:34Z

for reference: easystats/modelbased#57

JWiley · 2022-02-06T23:03:42Z

Not sure if this would be any help, but here is code I used to calculate population averaged values out of brms.

It is fairly straight forward with a single, random intercept only, but becomes harder with random intercepts & slopes or multiple random effects. brms has "blocks" for each random effect set, so I loop through those and add them all up and relied on monte carlo integration rather than any analytic solution to keep the code as simple as possible regardless of the random effect structure.

strengejacke · 2024-05-01T20:30:17Z

I think this vignette should clarify the available options:
https://strengejacke.github.io/ggeffects/articles/introduction_randomeffects.html

strengejacke added the reprex 💾 An example (and data) to reproduce the issue is needed to label May 26, 2021

strengejacke mentioned this issue Feb 6, 2022

'Bias adjusted' centrality and uncertainty measures for models with response transformations and/or random effects easystats/modelbased#57

Open

strengejacke changed the title ~~predictions for a mixed logistic model in glmer~~ population averaged vs. subject specific predictions for merMod Feb 6, 2022

strengejacke mentioned this issue Feb 6, 2022

type = "re" could also adjust predicted values #137

Closed

strengejacke added 3 investigators ❔❓ enhancement 💥 Implemented features can be improved or revised and removed reprex 💾 An example (and data) to reproduce the issue is needed to labels Feb 6, 2022

strengejacke closed this as completed May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

population averaged vs. subject specific predictions for merMod #215

population averaged vs. subject specific predictions for merMod #215

benpelz commented May 17, 2021

strengejacke commented May 26, 2021

benpelz commented May 31, 2021 •

edited

Loading

strengejacke commented Feb 6, 2022

JWiley commented Feb 6, 2022

strengejacke commented May 1, 2024

population averaged vs. subject specific predictions for merMod #215

population averaged vs. subject specific predictions for merMod #215

Comments

benpelz commented May 17, 2021

strengejacke commented May 26, 2021

benpelz commented May 31, 2021 • edited Loading

strengejacke commented Feb 6, 2022

JWiley commented Feb 6, 2022

strengejacke commented May 1, 2024

benpelz commented May 31, 2021 •

edited

Loading