-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'Bias adjusted' centrality and uncertainty measures for models with response transformations and/or random effects #57
Comments
Sorry for the delay @nahorp in replying to this. Let me look into that! |
If I understand correctly, this 'bias adjustment' argument only affect GLM(M), i.e., non-linear (mixed) models? LMER: same library(lme4)
library(emmeans)
data <- iris
data$Petal.Length_factor <- as.factor(ifelse(data$Petal.Length < 4.2, "A", "B"))
model <- lme4::lmer(Sepal.Width ~ Species + (1 | Petal.Length_factor), data = data)
emmeans::emmeans(model, "Species", bias.adj = FALSE)
#> Species emmean SE df lower.CL upper.CL
#> setosa 3.60 0.191 1.11 1.676 5.52
#> versicolor 2.73 0.184 1.00 0.387 5.07
#> virginica 2.80 0.191 1.11 0.878 4.73
#>
#> Degrees-of-freedom method: kenward-roger
#> Confidence level used: 0.95
emmeans::emmeans(model, "Species", bias.adj = TRUE)
#> Species emmean SE df lower.CL upper.CL
#> setosa 3.60 0.191 1.11 1.676 5.52
#> versicolor 2.73 0.184 1.00 0.387 5.07
#> virginica 2.80 0.191 1.11 0.878 4.73
#>
#> Degrees-of-freedom method: kenward-roger
#> Confidence level used: 0.95 Created on 2020-05-21 by the reprex package (v0.3.0) However, it changes for general models when the predictions are made on the response scale. GLM library(emmeans)
data <- iris
data$Petal.Length_factor <- as.factor(ifelse(data$Petal.Length < 4.2, "A", "B"))
model <- glm(Petal.Length_factor ~ Species, data = data, family="binomial")
emmeans::emmeans(model, "Species", type = "response", bias.adj = FALSE)
#> Species prob SE df asymp.LCL asymp.UCL
#> setosa 0.00 0.00000293 Inf 0.000000 1.000000
#> versicolor 0.62 0.06864401 Inf 0.479635 0.742805
#> virginica 1.00 0.00000293 Inf 0.000000 1.000000
#>
#> Confidence level used: 0.95
#> Intervals are back-transformed from the logit scale
emmeans::emmeans(model, "Species", type = "response", bias.adj = TRUE)
#> Species prob SE df asymp.LCL asymp.UCL
#> setosa 0.000000 0.0000036 Inf 0.000000 1.00000
#> versicolor 0.607228 0.0622312 Inf 0.481932 0.72185
#> virginica 1.000000 0.0000036 Inf 0.000000 1.00000
#>
#> Confidence level used: 0.95
#> Intervals are back-transformed from the logit scale
#> Bias adjustment applied based on sigma = 0.67212 Created on 2020-05-21 by the reprex package (v0.3.0) As |
No idea... I've never used this option, and not sure what it really means... |
I must say the explanation is quite obscure (truth I was hoping for your light 😁) |
I'll look into it the next days, no time today. |
That's one aspect of it @DominiqueMakowski. So, the inference I made from reading those links is,
|
|
I think, this is the code to modify the link-inverse function to apply bias-adjustment: function (link, sigma)
{
if (is.null(sigma))
stop("Must specify 'sigma' to obtain bias-adjusted back transformations",
call. = FALSE)
link$inv = link$linkinv
link$der = link$mu.eta
link$sigma22 = sigma^2/2
link$der2 = function(eta) with(link, 1000 * (der(eta + 5e-04) -
der(eta - 5e-04)))
link$linkinv = function(eta) with(link, inv(eta) + sigma22 *
der2(eta))
link$mu.eta = function(eta) with(link, der(eta) + 1000 *
sigma22 * (der2(eta + 5e-04) - der2(eta - 5e-04)))
link
} |
Taking the mixed models example, I'm not sure whether it is more appropriate to rely on library(emmeans)
#> Warning: package 'emmeans' was built under R version 3.6.3
require(lme4)
#> Loading required package: lme4
#> Warning: package 'lme4' was built under R version 3.6.3
#> Loading required package: Matrix
cbpp <- transform(cbpp, unit = 1:nrow(cbpp))
cbpp.glmer <- glmer(cbind(incidence, size - incidence) ~ period +
(1 | herd) + (1 | unit),
family = binomial, data = cbpp)
emm <- emmeans(cbpp.glmer, "period")
summary(emm, type = "response")
#> period prob SE df asymp.LCL asymp.UCL
#> 1 0.1824 0.0442 Inf 0.1109 0.2852
#> 2 0.0614 0.0230 Inf 0.0290 0.1252
#> 3 0.0558 0.0220 Inf 0.0254 0.1182
#> 4 0.0334 0.0172 Inf 0.0120 0.0894
#>
#> Confidence level used: 0.95
#> Intervals are back-transformed from the logit scale
total_sd <- sqrt(sqrt(0.89107^2 + 0.18396^2))
total_sd2 <- sqrt(insight::get_variance_residual(cbpp.glmer))
summary(emm, type = "response", bias.adjust = TRUE, sigma = total_sd)
#> period prob SE df asymp.LCL asymp.UCL
#> 1 0.2255 0.0464 Inf 0.1458 0.325
#> 2 0.0844 0.0299 Inf 0.0411 0.163
#> 3 0.0771 0.0289 Inf 0.0361 0.154
#> 4 0.0470 0.0235 Inf 0.0172 0.120
#>
#> Confidence level used: 0.95
#> Intervals are back-transformed from the logit scale
#> Bias adjustment applied based on sigma = 0.95387
summary(emm, type = "response", bias.adjust = TRUE, sigma = total_sd2)
#> period prob SE df asymp.LCL asymp.UCL
#> 1 0.3758 0.0538 Inf 0.2675 0.464
#> 2 0.1647 0.0538 Inf 0.0833 0.293
#> 3 0.1513 0.0528 Inf 0.0733 0.281
#> 4 0.0948 0.0456 Inf 0.0356 0.226
#>
#> Confidence level used: 0.95
#> Intervals are back-transformed from the logit scale
#> Bias adjustment applied based on sigma = 2.0209 Created on 2020-05-22 by the reprex package (v0.3.0) |
Just to clarify, so this bias correction is only useful for non-linear mixed models right? And also, I'm still having a hard time understanding what does it do intuitively |
No, if I understood right, you need it for mixed models in general, and for linear models with transformed response variable. It is about the residual variance. Predictions are affected when the response is transformed, because residual variance then is different from the same models without transformation (say, log) of the response. This is quite clear. However, it sems that simple back-transformation of the predicted values (say, exp) is not exact, because the residual error in the predictions of the log-transformed response cannot be easily "re-calculated" when back-transforming the predicted values. In short: computation of estimated marginal means takes residual variance into account, but you cannot just "exp" the error term when back-transforming log-response, resulting in "biased" back-transformed estimated marginal means. This requires "bias adjustment". For mixed models, we have residual variance on different levels (because multilevel and so, you know...), and that's why we need some bias correction here as well if we want to take the random effects uncertainty into account for predictions. You can see the differences in this vignette when you look at the results from |
all in all, it sounds like a good thing, so do you think we should set |
Therefor, I would set bias adjustment to FALSE by default, and add a message like 'consider bias adjustment' whenever we have a transformed response or a mixed model. |
okay, and should we (re)suggest to Russel to consider using get_variance()? |
No, I already did (see https://cran.r-project.org/web/packages/emmeans/vignettes/predictions.html#sd-estimate). He was open to the suggestion, but I think he is hesitant because he's not certain what |
@strengejacke your explanation is excellent! Thanks! |
Some references, not sure if these are useful |
Any changes in the status of this? |
no, not yet. |
If it is any help, I'm not sure 'bias' is the optimal description here. In general: Anytime there are covariates and even without covariates in mixed models because of the random effects, predictions are always conditional, unless you go out of your way to get population average estimates marginalising over the conditioning factors, see for example: |
See also the example of Ben between gee and glmer (i.e. between population averaged and subject specific) here: |
The problem of averaging on the link scale being the default in most packages has led me to implement my own routines for predicting representative marginal responses from (especially logistic regression) models. Assume we have a dataset with two groups of the same size, group A has 50-50 chance to experience an outcome, group B has a near 0 chance to experience that. A back of the envelope computation for the results in the group gives~25% of that outcome. Here is an example (overlaid with proportions of the binned observed data, see #120 )
One solution is of course to separate the predictions, but sometimes we may want to see the effect of another predictor on the total population marginalzed over the two groups. I am wondering if "bias adjustment" adresses that. I think the "regrid" function in emmeans was supposed to adress this, but I found it difficult to make that work consistently. |
I don't have much to add to the discussion but was led here from the other thread, and wanted to react to:
I would definitely not make something the default unless we are 100% sure it is correct and we understand it very well. |
I agree. From Russ Lenth's docs:
So it is only appropriate with transformed outcomes or with generalized mixed models (but not with linear mixed or with genialized fixed only). |
For linear mixed is required as well, I'd say, because you want to adjust for RE variance as well. See my answer here: #57 (comment) |
Accounting for the random effect's uncertainty in a linear model does not affect the prediction (only it's uncertainty). And so that is not a bias adjustment. library(lme4)
#> Loading required package: Matrix
library(emmeans)
#> Warning: package 'emmeans' was built under R version 4.1.3
m_mixed <- lmer(Reaction ~ Days + (1|Subject),
data = sleepstudy)
emmeans(m_mixed, ~ Days, cov.red = range)
#> Days emmean SE df lower.CL upper.CL
#> 0 251 9.75 22.8 231 272
#> 9 346 9.75 22.8 325 366
#>
#> Degrees-of-freedom method: kenward-roger
#> Confidence level used: 0.95
# Same predictions (also CIs...)
Xrtra_disp <- sqrt(insight::get_variance(m_mixed)[["var.intercept"]])
emmeans(m_mixed, ~ Days, cov.red = range, bias.adjust = TRUE, sigma = Xrtra_disp)
#> Days emmean SE df lower.CL upper.CL
#> 0 251 9.75 22.8 231 272
#> 9 346 9.75 22.8 325 366
#>
#> Degrees-of-freedom method: kenward-roger
#> Confidence level used: 0.95 Created on 2022-06-25 by the reprex package (v2.0.1) In a generalized linear mixed model, you do need to bias adjust in order to account for the bias from the fact that the mean random effect is not the population random effect (inv.link(mean(eta)) != E[mu]): library(lme4)
#> Loading required package: Matrix
library(emmeans)
#> Warning: package 'emmeans' was built under R version 4.1.3
m_fixed <- glm(cbind(incidence, size - incidence) ~ period,
family = binomial, data = cbpp)
cbpp <- transform(cbpp, unit = 1:nrow(cbpp))
m_mixed <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd) + (1|unit),
family = binomial, data = cbpp)
emmeans(m_mixed, ~ period, regrid = "response")
#> period prob SE df asymp.LCL asymp.UCL
#> 1 0.1824 0.0442 Inf 0.095664 0.2691
#> 2 0.0614 0.0230 Inf 0.016308 0.1065
#> 3 0.0558 0.0220 Inf 0.012628 0.0989
#> 4 0.0334 0.0172 Inf -0.000374 0.0671
#>
#> Confidence level used: 0.95
Xrtra_disp <- sqrt(sum(insight::get_variance(m_mixed)[["var.intercept"]]))
emmeans(m_mixed, ~ period, regrid = "response", bias.adjust = TRUE, sigma = Xrtra_disp)
#> period prob SE df asymp.LCL asymp.UCL
#> 1 0.2216 0.0462 Inf 0.131094 0.3121
#> 2 0.0823 0.0292 Inf 0.025024 0.1397
#> 3 0.0751 0.0282 Inf 0.019779 0.1305
#> 4 0.0458 0.0230 Inf 0.000822 0.0908
#>
#> Confidence level used: 0.95
# These (true unbiased predictions) are closer to the bias adjusted predictions
emmeans(m_fixed, ~ period, regrid = "response")
#> period prob SE df asymp.LCL asymp.UCL
#> 1 0.2194 0.0248 Inf 0.1708 0.2681
#> 2 0.0802 0.0187 Inf 0.0436 0.1167
#> 3 0.0711 0.0183 Inf 0.0352 0.1069
#> 4 0.0452 0.0167 Inf 0.0125 0.0779
#>
#> Confidence level used: 0.95 Created on 2022-06-25 by the reprex package (v2.0.1) |
Maybe the answer I contributed to this question in the CrossValidated forum will address some of these concerns -- or maybe not... |
Hello,
I came across bias adjustment for frequentist models involving response transformations and/or random effects while reading the detailed vignette in the emmeans package and wondered if the same applied to Bayesian models as well. Indeed, emmeans have a small section on Bayesian models.
It would be great if these could be ported over to the relevant functions (
estimate_means
andestimate_contrasts
) here. Going a step further, it would be even better if thesigma
argument could be calculated automatically for common model objects from popular Bayesian R packages (such asrstanarm
,brms
,BayesFactor
)Thank you
The text was updated successfully, but these errors were encountered: