'Bias adjusted' centrality and uncertainty measures for models with response transformations and/or random effects #57

nahorp · 2020-04-30T21:50:05Z

Hello,

I came across bias adjustment for frequentist models involving response transformations and/or random effects while reading the detailed vignette in the emmeans package and wondered if the same applied to Bayesian models as well. Indeed, emmeans have a small section on Bayesian models.

It would be great if these could be ported over to the relevant functions (estimate_means and estimate_contrasts) here. Going a step further, it would be even better if the sigma argument could be calculated automatically for common model objects from popular Bayesian R packages (such as rstanarm, brms, BayesFactor)

Thank you

The text was updated successfully, but these errors were encountered:

DominiqueMakowski · 2020-05-21T02:41:55Z

Sorry for the delay @nahorp in replying to this.

Let me look into that!

DominiqueMakowski · 2020-05-21T08:59:06Z

If I understand correctly, this 'bias adjustment' argument only affect GLM(M), i.e., non-linear (mixed) models?

LMER: same

      library(lme4)
      library(emmeans)
      
      data <- iris
      data$Petal.Length_factor <- as.factor(ifelse(data$Petal.Length < 4.2, "A", "B"))

      model <- lme4::lmer(Sepal.Width ~ Species + (1 | Petal.Length_factor), data = data)
      emmeans::emmeans(model, "Species", bias.adj = FALSE)
#>  Species    emmean    SE   df lower.CL upper.CL
#>  setosa       3.60 0.191 1.11    1.676     5.52
#>  versicolor   2.73 0.184 1.00    0.387     5.07
#>  virginica    2.80 0.191 1.11    0.878     4.73
#> 
#> Degrees-of-freedom method: kenward-roger 
#> Confidence level used: 0.95
      emmeans::emmeans(model, "Species", bias.adj = TRUE)
#>  Species    emmean    SE   df lower.CL upper.CL
#>  setosa       3.60 0.191 1.11    1.676     5.52
#>  versicolor   2.73 0.184 1.00    0.387     5.07
#>  virginica    2.80 0.191 1.11    0.878     4.73
#> 
#> Degrees-of-freedom method: kenward-roger 
#> Confidence level used: 0.95

^{Created on 2020-05-21 by the reprex package (v0.3.0)}

However, it changes for general models when the predictions are made on the response scale.

GLM

    library(emmeans)

    data <- iris
    data$Petal.Length_factor <- as.factor(ifelse(data$Petal.Length < 4.2, "A", "B"))
    
    model <- glm(Petal.Length_factor ~ Species, data = data, family="binomial")
    
    emmeans::emmeans(model, "Species", type = "response", bias.adj = FALSE)
#>  Species    prob         SE  df asymp.LCL asymp.UCL
#>  setosa     0.00 0.00000293 Inf  0.000000  1.000000
#>  versicolor 0.62 0.06864401 Inf  0.479635  0.742805
#>  virginica  1.00 0.00000293 Inf  0.000000  1.000000
#> 
#> Confidence level used: 0.95 
#> Intervals are back-transformed from the logit scale
    emmeans::emmeans(model, "Species", type = "response", bias.adj = TRUE)
#>  Species        prob        SE  df asymp.LCL asymp.UCL
#>  setosa     0.000000 0.0000036 Inf  0.000000   1.00000
#>  versicolor 0.607228 0.0622312 Inf  0.481932   0.72185
#>  virginica  1.000000 0.0000036 Inf  0.000000   1.00000
#> 
#> Confidence level used: 0.95 
#> Intervals are back-transformed from the logit scale 
#> Bias adjustment applied based on sigma = 0.67212

^{Created on 2020-05-21 by the reprex package (v0.3.0)}

As modelbased by default returns the means/contrasts on the response scale, my naive question is, should we bias-adjust by default? 🤔 Pinging also @mattansb and @strengejacke

mattansb · 2020-05-21T09:01:20Z

No idea... I've never used this option, and not sure what it really means...

DominiqueMakowski · 2020-05-21T09:04:55Z

I must say the explanation is quite obscure (truth I was hoping for your light 😁)

strengejacke · 2020-05-21T10:34:15Z

I'll look into it the next days, no time today.

nahorp · 2020-05-21T21:06:20Z

If I understand correctly, this 'bias adjustment' argument only affect GLM(M), i.e., non-linear (mixed) models?

That's one aspect of it @DominiqueMakowski. So, the inference I made from reading those links is,

Anything with 'random' effects (i.e., mixed models) are affected (i.e., LMMs and GLMMs)
Anything where the DV is transformed (i.e., response transformation) outside of the use of the link function is affected.
Thus, LMs and GLMs without response transformations are the only models not affected.

strengejacke · 2020-05-22T06:40:06Z

In an ordinary GLM, no bias adjustment is needed, nor is it appropriate, because the link function is just used to define a nonlinear relationship between the actual response mean η and the linear predictor.

Note that, in a generalized linear mixed model, including generalized estimating equations and such, there are additive random components involved, and then bias adjustment becomes appropriate

strengejacke · 2020-05-22T07:23:45Z

I think, this is the code to modify the link-inverse function to apply bias-adjustment:

function (link, sigma) 
{
  if (is.null(sigma)) 
    stop("Must specify 'sigma' to obtain bias-adjusted back transformations", 
      call. = FALSE)
  link$inv = link$linkinv
  link$der = link$mu.eta
  link$sigma22 = sigma^2/2
  link$der2 = function(eta) with(link, 1000 * (der(eta + 5e-04) - 
    der(eta - 5e-04)))
  link$linkinv = function(eta) with(link, inv(eta) + sigma22 * 
    der2(eta))
  link$mu.eta = function(eta) with(link, der(eta) + 1000 * 
    sigma22 * (der2(eta + 5e-04) - der2(eta - 5e-04)))
  link
}

strengejacke · 2020-05-22T07:46:30Z

Taking the mixed models example, I'm not sure whether it is more appropriate to rely on insight::get_variance()?

library(emmeans)
#> Warning: package 'emmeans' was built under R version 3.6.3
require(lme4)
#> Loading required package: lme4
#> Warning: package 'lme4' was built under R version 3.6.3
#> Loading required package: Matrix
cbpp <- transform(cbpp, unit = 1:nrow(cbpp))
cbpp.glmer <- glmer(cbind(incidence, size - incidence) ~ period + 
                      (1 | herd) +  (1 | unit),
                    family = binomial, data = cbpp)

emm <- emmeans(cbpp.glmer, "period")
summary(emm, type = "response")
#>  period   prob     SE  df asymp.LCL asymp.UCL
#>  1      0.1824 0.0442 Inf    0.1109    0.2852
#>  2      0.0614 0.0230 Inf    0.0290    0.1252
#>  3      0.0558 0.0220 Inf    0.0254    0.1182
#>  4      0.0334 0.0172 Inf    0.0120    0.0894
#> 
#> Confidence level used: 0.95 
#> Intervals are back-transformed from the logit scale

total_sd <- sqrt(sqrt(0.89107^2 + 0.18396^2))
total_sd2 <- sqrt(insight::get_variance_residual(cbpp.glmer))

summary(emm, type = "response", bias.adjust = TRUE, sigma = total_sd)
#>  period   prob     SE  df asymp.LCL asymp.UCL
#>  1      0.2255 0.0464 Inf    0.1458     0.325
#>  2      0.0844 0.0299 Inf    0.0411     0.163
#>  3      0.0771 0.0289 Inf    0.0361     0.154
#>  4      0.0470 0.0235 Inf    0.0172     0.120
#> 
#> Confidence level used: 0.95 
#> Intervals are back-transformed from the logit scale 
#> Bias adjustment applied based on sigma = 0.95387
summary(emm, type = "response", bias.adjust = TRUE, sigma = total_sd2)
#>  period   prob     SE  df asymp.LCL asymp.UCL
#>  1      0.3758 0.0538 Inf    0.2675     0.464
#>  2      0.1647 0.0538 Inf    0.0833     0.293
#>  3      0.1513 0.0528 Inf    0.0733     0.281
#>  4      0.0948 0.0456 Inf    0.0356     0.226
#> 
#> Confidence level used: 0.95 
#> Intervals are back-transformed from the logit scale 
#> Bias adjustment applied based on sigma = 2.0209

^{Created on 2020-05-22 by the reprex package (v0.3.0)}

DominiqueMakowski · 2020-05-22T08:04:09Z

Just to clarify, so this bias correction is only useful for non-linear mixed models right?

And also, I'm still having a hard time understanding what does it do intuitively

strengejacke · 2020-05-22T09:14:30Z

Just to clarify, so this bias correction is only useful for non-linear mixed models right?

No, if I understood right, you need it for mixed models in general, and for linear models with transformed response variable. It is about the residual variance. Predictions are affected when the response is transformed, because residual variance then is different from the same models without transformation (say, log) of the response. This is quite clear. However, it sems that simple back-transformation of the predicted values (say, exp) is not exact, because the residual error in the predictions of the log-transformed response cannot be easily "re-calculated" when back-transforming the predicted values.

In short: computation of estimated marginal means takes residual variance into account, but you cannot just "exp" the error term when back-transforming log-response, resulting in "biased" back-transformed estimated marginal means. This requires "bias adjustment".

For mixed models, we have residual variance on different levels (because multilevel and so, you know...), and that's why we need some bias correction here as well if we want to take the random effects uncertainty into account for predictions. You can see the differences in this vignette when you look at the results from type = "fe" and type = "re". In ggeffects, only the uncertainty intervals are adjusted, because I did not know how to adjust the estimates (predicted values). Now reading this emmeans vignette shows how to adjust estimates as well, though it looks more like a "hack" than a "trustworthy" approach (due to the multiplication with 1000)...

DominiqueMakowski · 2020-05-22T09:28:51Z

all in all, it sounds like a good thing, so do you think we should set bias.adj to TRUE by default? or should the fact that it's not the default in emmeans be a hint and rise caution?

strengejacke · 2020-05-22T09:33:45Z

sigma() is not well defined for "most" mixed models, e.g. models with more complex random effects structures (like in the emmeans-vignette-example), thus I would suggest using insight::get_variance() to get the residual variance. This, however, may take some time for larger models.

Therefor, I would set bias adjustment to FALSE by default, and add a message like 'consider bias adjustment' whenever we have a transformed response or a mixed model.

DominiqueMakowski · 2020-05-22T09:59:27Z

okay, and should we (re)suggest to Russel to consider using get_variance()?

strengejacke · 2020-05-22T10:38:08Z

No, I already did (see https://cran.r-project.org/web/packages/emmeans/vignettes/predictions.html#sd-estimate). He was open to the suggestion, but I think he is hesitant because he's not certain what get_variance() internally does... I share this concern, because even I myself am not certain 😬 The initial code was written by Ben Bolker, and I asked him if I could use this in my package, and I only added functionality for more packages. The main computation of variances (except for zero-inflated negative binomial and zero-inflated poisson models) was all written by Ben.

mattansb · 2020-05-22T11:26:49Z

@strengejacke your explanation is excellent! Thanks!

strengejacke · 2020-05-23T07:31:43Z

Some references, not sure if these are useful
https://www.jstor.org/stable/2669622?seq=1
https://www.semanticscholar.org/paper/Correction-of-Retransformation-Bias-in-Nonlinear-on-Liu-Mc/69a23b65647adfa635e18d4fafd2145ba51f2f5b

DominiqueMakowski · 2021-05-25T03:04:22Z

Any changes in the status of this?

strengejacke · 2021-05-25T06:40:48Z

no, not yet.

JWiley · 2021-12-16T06:58:04Z

If it is any help, I'm not sure 'bias' is the optimal description here. In general:
mean(Y) != ginv(mean(g(y)))
as a specific example:
mean(Y) != exp(mean(log(Y)))
this is not really 'bias' per se --- exp(mean(log(Y))) is a meaningful and accurate quantity, it is just not the arithmetic mean of Y.

Anytime there are covariates and even without covariates in mixed models because of the random effects, predictions are always conditional, unless you go out of your way to get population average estimates marginalising over the conditioning factors, see for example:
https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-015-0046-6

strengejacke · 2022-02-06T10:54:09Z

See also the example of Ben between gee and glmer (i.e. between population averaged and subject specific) here:
strengejacke/ggeffects#215

sven-stodtmann · 2022-02-11T12:07:36Z

The problem of averaging on the link scale being the default in most packages has led me to implement my own routines for predicting representative marginal responses from (especially logistic regression) models.
My typical example/problem with the default behavior of averaging on the link scale is best described with a pratctical example

Assume we have a dataset with two groups of the same size, group A has 50-50 chance to experience an outcome, group B has a near 0 chance to experience that. A back of the envelope computation for the results in the group gives~25% of that outcome.
However if we predict the marginal mean response for that group whihle averaging on link scale we may get much much lower values, because on the link scale 0 maps to -Infinity and 50% maps to 0. so the mean will be -Infinity ("arbitrarily small"). Applying the link to that will give me a perdiction of 0% outcome in the combined population, which is obviously false.

Here is an example (overlaid with proportions of the binned observed data, see #120 )

test <- tibble(
    cat  = factor(sample(c('A','B'),400,replace=TRUE)),
    con  = exp(rnorm(400,0,1)),
  )%>%
  mutate(
    lp = 0.4*con-if_else(cat=='A',10,0),
    pr = pmax(0.02,plogis(lp))
  )%>%
  mutate(
    resp=map_int(pr,~rbernoulli(1,p=.x))
  )
mod <- glm(resp~con+cat,data=test,family='binomial')
plot_logreg(mod,terms=c('con [n=40]'),transform_then_average=TRUE)+
  geom_line(data=tibble(ggeffects::ggemmeans(mod,terms='con [n=40]')),aes(col='ggeffects',x=x,y=predicted,ymin=conf.low,ymax=conf.high))

One solution is of course to separate the predictions, but sometimes we may want to see the effect of another predictor on the total population marginalzed over the two groups.

I am wondering if "bias adjustment" adresses that. I think the "regrid" function in emmeans was supposed to adress this, but I found it difficult to make that work consistently.

vincentarelbundock · 2022-06-22T13:17:30Z

I don't have much to add to the discussion but was led here from the other thread, and wanted to react to:

all in all, it sounds like a good thing, so do you think we should set bias.adj to TRUE by default? or should the fact that it's not the default in emmeans be a hint and rise caution?

I would definitely not make something the default unless we are 100% sure it is correct and we understand it very well.

mattansb · 2022-06-25T08:22:41Z

I agree.

From Russ Lenth's docs:

In an ordinary GLM, no bias adjustment is needed, nor is it appropriate, because the link function is just used to define a nonlinear relationship between the actual response mean η and the linear predictor. That is, the back-transformed parameter is already the mean.
[...]
Note that, in a generalized linear mixed model, including generalized estimating equations and such, there are additive random components involved, and then bias adjustment becomes appropriate.

So it is only appropriate with transformed outcomes or with generalized mixed models (but not with linear mixed or with genialized fixed only).

strengejacke · 2022-06-25T08:32:48Z

For linear mixed is required as well, I'd say, because you want to adjust for RE variance as well. See my answer here: #57 (comment)

mattansb · 2022-06-25T08:54:24Z

Accounting for the random effect's uncertainty in a linear model does not affect the prediction (only it's uncertainty). And so that is not a bias adjustment.

library(lme4)
#> Loading required package: Matrix
library(emmeans)
#> Warning: package 'emmeans' was built under R version 4.1.3

m_mixed <- lmer(Reaction ~ Days + (1|Subject),
                 data = sleepstudy)

emmeans(m_mixed, ~ Days, cov.red = range)
#>  Days emmean   SE   df lower.CL upper.CL
#>     0    251 9.75 22.8      231      272
#>     9    346 9.75 22.8      325      366
#> 
#> Degrees-of-freedom method: kenward-roger 
#> Confidence level used: 0.95

# Same predictions (also CIs...)
Xrtra_disp <- sqrt(insight::get_variance(m_mixed)[["var.intercept"]])
emmeans(m_mixed, ~ Days, cov.red = range, bias.adjust = TRUE, sigma = Xrtra_disp)
#>  Days emmean   SE   df lower.CL upper.CL
#>     0    251 9.75 22.8      231      272
#>     9    346 9.75 22.8      325      366
#> 
#> Degrees-of-freedom method: kenward-roger 
#> Confidence level used: 0.95

^{Created on 2022-06-25 by the reprex package (v2.0.1)}

In a generalized linear mixed model, you do need to bias adjust in order to account for the bias from the fact that the mean random effect is not the population random effect (inv.link(mean(eta)) != E[mu]):

library(lme4)
#> Loading required package: Matrix
library(emmeans)
#> Warning: package 'emmeans' was built under R version 4.1.3

m_fixed <- glm(cbind(incidence, size - incidence) ~ period,
               family = binomial, data = cbpp)

cbpp <- transform(cbpp, unit = 1:nrow(cbpp))
m_mixed <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd) + (1|unit),
                 family = binomial, data = cbpp)

emmeans(m_mixed, ~ period, regrid = "response")
#>  period   prob     SE  df asymp.LCL asymp.UCL
#>  1      0.1824 0.0442 Inf  0.095664    0.2691
#>  2      0.0614 0.0230 Inf  0.016308    0.1065
#>  3      0.0558 0.0220 Inf  0.012628    0.0989
#>  4      0.0334 0.0172 Inf -0.000374    0.0671
#> 
#> Confidence level used: 0.95

Xrtra_disp <- sqrt(sum(insight::get_variance(m_mixed)[["var.intercept"]]))
emmeans(m_mixed, ~ period, regrid = "response", bias.adjust = TRUE, sigma = Xrtra_disp)
#>  period   prob     SE  df asymp.LCL asymp.UCL
#>  1      0.2216 0.0462 Inf  0.131094    0.3121
#>  2      0.0823 0.0292 Inf  0.025024    0.1397
#>  3      0.0751 0.0282 Inf  0.019779    0.1305
#>  4      0.0458 0.0230 Inf  0.000822    0.0908
#> 
#> Confidence level used: 0.95

# These (true unbiased predictions) are closer to the bias adjusted predictions
emmeans(m_fixed, ~ period, regrid = "response")
#>  period   prob     SE  df asymp.LCL asymp.UCL
#>  1      0.2194 0.0248 Inf    0.1708    0.2681
#>  2      0.0802 0.0187 Inf    0.0436    0.1167
#>  3      0.0711 0.0183 Inf    0.0352    0.1069
#>  4      0.0452 0.0167 Inf    0.0125    0.0779
#> 
#> Confidence level used: 0.95

^{Created on 2022-06-25 by the reprex package (v2.0.1)}

rvlenth · 2022-08-04T21:50:42Z

Maybe the answer I contributed to this question in the CrossValidated forum will address some of these concerns -- or maybe not...

strengejacke mentioned this issue May 22, 2020

type = "re" could also adjust predicted values strengejacke/ggeffects#137

Closed

strengejacke mentioned this issue Jul 3, 2020

Deal seriously with the random effects for freq models #74

Closed

jacob-long mentioned this issue Jul 5, 2020

Look into bias-adjusted means for mixed/non-linear models jacob-long/jtools#86

Open

DominiqueMakowski added the help wanted label May 25, 2021

strengejacke mentioned this issue Feb 6, 2022

population averaged vs. subject specific predictions for merMod strengejacke/ggeffects#215

Closed

strengejacke mentioned this issue Jun 22, 2022

Bias adjustment vincentarelbundock/marginaleffects#388

Closed

strengejacke mentioned this issue May 12, 2023

get_predicted confidence intervals in mixed-effects models easystats/insight#677

Open

strengejacke added Help us 👀 Help is needed to implement something and removed help wanted labels Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'Bias adjusted' centrality and uncertainty measures for models with response transformations and/or random effects #57

'Bias adjusted' centrality and uncertainty measures for models with response transformations and/or random effects #57

nahorp commented Apr 30, 2020

DominiqueMakowski commented May 21, 2020

DominiqueMakowski commented May 21, 2020

mattansb commented May 21, 2020

DominiqueMakowski commented May 21, 2020

strengejacke commented May 21, 2020

nahorp commented May 21, 2020

strengejacke commented May 22, 2020

strengejacke commented May 22, 2020

strengejacke commented May 22, 2020

DominiqueMakowski commented May 22, 2020

strengejacke commented May 22, 2020 •

edited

Loading

DominiqueMakowski commented May 22, 2020

strengejacke commented May 22, 2020

DominiqueMakowski commented May 22, 2020

strengejacke commented May 22, 2020

mattansb commented May 22, 2020

strengejacke commented May 23, 2020

DominiqueMakowski commented May 25, 2021

strengejacke commented May 25, 2021

JWiley commented Dec 16, 2021

strengejacke commented Feb 6, 2022

sven-stodtmann commented Feb 11, 2022

vincentarelbundock commented Jun 22, 2022

mattansb commented Jun 25, 2022

strengejacke commented Jun 25, 2022

mattansb commented Jun 25, 2022

rvlenth commented Aug 4, 2022

'Bias adjusted' centrality and uncertainty measures for models with response transformations and/or random effects #57

'Bias adjusted' centrality and uncertainty measures for models with response transformations and/or random effects #57

Comments

nahorp commented Apr 30, 2020

DominiqueMakowski commented May 21, 2020

DominiqueMakowski commented May 21, 2020

LMER: same

GLM

mattansb commented May 21, 2020

DominiqueMakowski commented May 21, 2020

strengejacke commented May 21, 2020

nahorp commented May 21, 2020

strengejacke commented May 22, 2020

strengejacke commented May 22, 2020

strengejacke commented May 22, 2020

DominiqueMakowski commented May 22, 2020

strengejacke commented May 22, 2020 • edited Loading

DominiqueMakowski commented May 22, 2020

strengejacke commented May 22, 2020

DominiqueMakowski commented May 22, 2020

strengejacke commented May 22, 2020

mattansb commented May 22, 2020

strengejacke commented May 23, 2020

DominiqueMakowski commented May 25, 2021

strengejacke commented May 25, 2021

JWiley commented Dec 16, 2021

strengejacke commented Feb 6, 2022

sven-stodtmann commented Feb 11, 2022

vincentarelbundock commented Jun 22, 2022

mattansb commented Jun 25, 2022

strengejacke commented Jun 25, 2022

mattansb commented Jun 25, 2022

rvlenth commented Aug 4, 2022

strengejacke commented May 22, 2020 •

edited

Loading