Complex comparisons #1329
Replies: 5 comments 12 replies
-
Ooooh, this looks super cool! Life is crazy right now, but I look forward to checking this out carefully. |
Beta Was this translation helpful? Give feedback.
-
That's interesting. I'll take a close look when I have some time. But just to fix expectations, the design I currently have in mind is to expand the formula interface to cover most of the usual contrasts, and to steer other users toward custom functions. I think the interface for |
Beta Was this translation helpful? Give feedback.
-
Updating this example - it works great! (still missing #1344) @vincentarelbundock I can probably cook up a better example that actually makes sense that I would be happy to contribute as a "case study". Let me know what you think (: This is essentially a show-and-tell of solutions to #319, #479 & Complex contrastsCurrently, The solution is to do this via the Here are a few examples. # pak::pak("vincentarelbundock/marginaleffects")
library(marginaleffects)
packageVersion("marginaleffects")
## [1] '0.24.0.15'
library(emmeans)
data("mtcars")
mtcars$carb <- factor(mtcars$carb)
mod <- glm(am ~ carb + mpg,
family = binomial("logit"),
data = mtcars)
plot_predictions(mod, condition = "carb") +
ggplot2::geom_line(ggplot2::aes(carb, estimate), group = 1) Unit-level (conditional) estimatesWe need two things - the first is a data grid that has a all the grid <- datagrid(model = mod, carb = levels(mtcars$carb), grid_type = "counterfactual")
subset(grid, rowidcf %in% 1:2) # each rowid has all the levels of "carb"
## rowidcf am mpg carb
## 1 1 1 21 1
## 2 2 1 21 1
## 33 1 1 21 2
## 34 2 1 21 2
## 65 1 1 21 3
## 66 2 1 21 3
## 97 1 1 21 4
## 98 2 1 21 4
## 129 1 1 21 6
## 130 2 1 21 6
## 161 1 1 21 8
## 162 2 1 21 8 Next we need a function that can, for each w <- contr.poly(6)[,1:2] # weights
colnames(w) <- c("linear", "quadratic")
rownames(w) <- levels(mtcars$carb)
w
## linear quadratic
## 1 -0.5976143 0.5455447
## 2 -0.3585686 -0.1091089
## 3 -0.1195229 -0.4364358
## 4 0.1195229 -0.4364358
## 6 0.3585686 -0.1091089
## 8 0.5976143 0.5455447
custom_contrast <- function(x) {
setNames(
as.vector(x %*% w),
nm = colnames(w)
)
} Bring it all together: h1 <- predictions(mod, variables = list("carb" = levels),
type = "response",
hypothesis = ~ I(custom_contrast(x)) | rowidcf)
nrow(mtcars)
## [1] 32
nrow(h1) # two contrasts per obs.
## [1] 64
h1
##
## Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
## linear 0.865 0.1486 5.82 <0.001 27.3 0.5736 1.156
## quadratic 0.185 0.1432 1.29 0.197 2.3 -0.0959 0.466
## linear 0.865 0.1486 5.82 <0.001 27.3 0.5736 1.156
## quadratic 0.185 0.1432 1.29 0.197 2.3 -0.0959 0.466
## linear 0.662 0.1908 3.47 <0.001 10.9 0.2876 1.035
## --- 54 rows omitted. See ?print.marginaleffects ---
## quadratic 0.198 0.1308 1.51 0.131 2.9 -0.0586 0.454
## linear 0.963 0.0120 80.03 <0.001 Inf 0.9390 0.986
## quadratic 0.395 0.0456 8.65 <0.001 57.5 0.3056 0.485
## linear 0.829 0.1605 5.16 <0.001 22.0 0.5144 1.144
## quadratic 0.191 0.1477 1.30 0.195 2.4 -0.0981 0.481
## Type: response Average (marginal) estimates.We can’t use But we can pass the results fo further processing to h2 <- hypotheses(h1, hypothesis = ~ I(mean(x)) | hypothesis)
h2
##
## Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
## linear 0.758 0.0594 12.76 <0.001 121.5 0.642 0.874
## quadratic 0.326 0.0728 4.48 <0.001 17.0 0.183 0.468 Marginal effects at the meanSee also this This should closely match predictions(mod, variables = list("carb" = levels),
type = "response",
newdata = "mean",
hypothesis = ~ I(custom_contrast(x)))
##
## Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
## linear 0.924 0.119 7.74 <0.001 46.6 0.6901 1.158
## quadratic 0.188 0.135 1.40 0.161 2.6 -0.0752 0.452
##
## Type: response
emmeans(mod, ~ carb, regrid = TRUE) |>
contrast(method = "poly", max.deg = 2)
## contrast estimate SE df z.ratio p.value
## linear 7.73 0.998 Inf 7.745 <.0001
## quadratic 1.73 1.230 Inf 1.401 0.1612 Note that More…conditional contrastsWe can further expand these ideas to get conditional contrasts… h3 <- predictions(mod, variables = list("carb" = levels, "mpg" = "sd"),
type = "response",
hypothesis = ~ I(custom_contrast(x)) | rowidcf + mpg)
h3
##
## mpg Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
## 17.1 linear 0.970 0.0367 26.43 < 0.001 509.1 0.8981 1.042
## 17.1 quadratic 0.322 0.0876 3.67 < 0.001 12.0 0.1499 0.493
## 23.1 linear 0.619 0.1940 3.19 0.00141 9.5 0.2391 0.999
## 23.1 quadratic 0.263 0.1601 1.64 0.10086 3.3 -0.0511 0.576
## 17.1 linear 0.970 0.0367 26.43 < 0.001 509.1 0.8981 1.042
## --- 118 rows omitted. See ?print.marginaleffects ---
## 23.1 quadratic 0.263 0.1601 1.64 0.10086 3.3 -0.0511 0.576
## 17.1 linear 0.970 0.0367 26.43 < 0.001 509.1 0.8981 1.042
## 17.1 quadratic 0.322 0.0876 3.67 < 0.001 12.0 0.1499 0.493
## 23.1 linear 0.619 0.1940 3.19 0.00141 9.5 0.2391 0.999
## 23.1 quadratic 0.263 0.1601 1.64 0.10086 3.3 -0.0511 0.576
## Type: response h4 <- hypotheses(h3, hypothesis = ~ I(mean(x)) | hypothesis + mpg)
h4
##
## Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
## linear 0.970 0.0367 26.43 < 0.001 509.1 0.8981 1.042
## linear 0.619 0.1940 3.19 0.00141 9.5 0.2391 0.999
## quadratic 0.322 0.0876 3.67 < 0.001 12.0 0.1499 0.493
## quadratic 0.263 0.1601 1.64 0.10086 3.3 -0.0511 0.576 interaction contrasts… and interaction contrasts: custom_interaction_contrast <- function(x) {
setNames(
diff(x),
"mpg: hi - lo"
)
}
hypotheses(h4, hypothesis = ~ I(custom_interaction_contrast(x)) | term)
##
## Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
## -0.3508 0.181 -1.942 0.0522 4.3 -0.705 0.00329
## -0.0589 0.157 -0.376 0.7066 0.5 -0.366 0.24793
##
## Term: mpg: hi - lo etc… Joint testsConditionalFor the effect of carb w <- contr.helmert(6)
avg_custom_contrast <- function(x) {
setNames(
as.vector(x %*% w),
nm = 1:5
)
}
h5 <- predictions(mod, variables = list("carb" = levels, mpg = c(14, 26)),
type = "response", newdata = "mean",
hypothesis = ~ I(avg_custom_contrast(x)) | mpg)
h5
##
## mpg Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
## 14 1 -0.00151 0.00567 -0.266 0.790 0.3 -0.0126 0.0096
## 14 2 -0.00635 0.01449 -0.438 0.662 0.6 -0.0348 0.0221
## 14 3 0.16852 0.21821 0.772 0.440 1.2 -0.2592 0.5962
## 14 4 3.93536 0.08672 45.380 <0.001 Inf 3.7654 4.1053
## 14 5 3.93537 0.08642 45.539 <0.001 Inf 3.7660 4.1047
## 26 1 -0.06812 0.21718 -0.314 0.754 0.4 -0.4938 0.3575
## 26 2 -1.65814 0.30747 -5.393 <0.001 23.8 -2.2608 -1.0555
## 26 3 1.31185 0.27035 4.852 <0.001 19.6 0.7820 1.8417
## 26 4 1.35184 0.32103 4.211 <0.001 15.3 0.7226 1.9811
## 26 5 1.35184 0.32103 4.211 <0.001 15.3 0.7226 1.9811
##
## Type: response
emc <- emmeans(mod, ~ carb + mpg, regrid = TRUE, at = list(mpg = c(14, 26))) |>
contrast(method = as.data.frame(w), by = "mpg")
emc
## mpg = 14:
## contrast estimate SE df z.ratio p.value
## V1 -0.00151 0.00567 Inf -0.266 0.7901
## V2 -0.00634 0.01450 Inf -0.438 0.6616
## V3 0.16852 0.21800 Inf 0.772 0.4399
## V4 3.93536 0.08670 Inf 45.381 <.0001
## V5 3.93537 0.08640 Inf 45.539 <.0001
##
## mpg = 26:
## contrast estimate SE df z.ratio p.value
## V1 -0.06813 0.21700 Inf -0.314 0.7538
## V2 -1.65814 0.30700 Inf -5.393 <.0001
## V3 1.31185 0.27000 Inf 4.852 <.0001
## V4 1.35184 0.32100 Inf 4.211 <.0001
## V5 1.35184 0.32100 Inf 4.211 <.0001 hypotheses(h5, joint = 1:5)
##
##
## Joint hypothesis test:
## b1 = 0
## b2 = 0
## b3 = 0
## b4 = 0
## b5 = 0
##
## F Pr(>|F|) Df 1 Df 2
## 1.9e+07 <0.001 5 25
hypotheses(h5, joint = 6:10)
##
##
## Joint hypothesis test:
## b6 = 0
## b7 = 0
## b8 = 0
## b9 = 0
## b10 = 0
##
## F Pr(>|F|) Df 1 Df 2
## 1559 <0.001 5 25
joint_tests(mod, by = "mpg", regrid = TRUE, at = list(mpg = c(14, 26)))
## mpg = 14:
## model term df1 df2 F.ratio Chisq p.value
## carb 5 Inf 18356583.000 91782913 <.0001
##
## mpg = 26:
## model term df1 df2 F.ratio Chisq p.value
## carb 5 Inf 1559.000 7795 <.0001 |
Beta Was this translation helpful? Give feedback.
-
Yep, I think it's a good idea to add a couple examples of the custom contrast functions via formula interface. Note that we already have a bunch of examples of grouping with formula here: https://marginaleffects.com/bonus/hypothesis.html#formulas-group-wise The website code is in a private repo because I'm working on the book there, but I paste the relevant section code (folded below) in case you want to play with some options. Two minor style things:
Thanks for engaging! I'm excited about the new features we ended up with.
````r
# Formulas (Group-wise)
Since version 0.20.1.5 of
We can make "sequential", "reference", or "meandev" comparisons, either as differences (the default) or ratios:
We can also test hypotheses by subgroup. For example, in this case, we want to compare every estimate to the reference category (first estimate) in each subset of
The grouping component of the formula is particularly useful when conducting tests on contrasts:
It is also a powerful tool in categorical outcome models, to compare estimates for different outcome levels:
Is the "8 vs. 4" contrast equal to the "6 vs. 4" contrast, within each outcome level?
Is the "8 vs. 4" contrast for outcome level 3 equal to the "8 vs. 4" contrast for outcome level 4?
FunctionsHypothesis tests can also be conducted using arbitrary PredictionsWhen supplying a function to the In this example, we test if the mean predicted value is different from 2:
In this ordinal logit model, the
We can use a function in the
And we can compare the two categories by doing:
|
Beta Was this translation helpful? Give feedback.
-
Let's close this discussion now, but feel free to open others to discuss more specific points. |
Beta Was this translation helpful? Give feedback.
-
Currently,
comparisons()
supports comparing two distinct values of focal factor predictors. However, it is often of interest to comparegroups of levels or to compute otherwise arbitrary contrasts that are based on multiple levels of a factor.
The solution is to do this via the
predictions()
function with clever uses ofhypothesis=
andnewdata=
arguments.Here are a few examples.
Unit-level (conditional) estimates
We need two things - the first is a data grid that has a all the (relevant) levels of the focal predictor for each row. This is the
default behavior when supplying the
variables=
argument, but let’s do this manually withdatagrid()
.Next we need a function that can, for each
rowidcf
level compute the contrasts of interest. In this case, we are interested in the linear and quadratic trends:{data.table}
{tidyverse}
Bring it all together:
Average (marginal) estimates.
We can’t use
avg_predictions()
orpredictions(by=)
here, because thehypothesis=
argument only get applied after the averaging.So we need to wrap our
custom_contrast()
function in another function:{data.table}
{tidyverse}
Marginal effects at the mean
See also this amazingness.
This should closely match
{emmeans}
.{data.table}
{tidyverse}
Note that
{emmeans}
scales the estimate differently, but the z- and p-values are the same.More…
We can further expand these ideas to get conditional contrasts…
{data.table}
{tidyverse}
… and interaction contrasts:
{data.table}
{tidyverse}
etc…
This code gives only differences between aggregated means, but the
hypothesis=<function>
functionality can truly be expanded to arbitrary estimates.For example, compute the harmonic means, then compare their ratios:
{data.table}
{tidyverse}
Joint tests
Conditional
For the effect of carb
{data.table}
{tidyverse}
Beta Was this translation helpful? Give feedback.
All reactions