Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect standard errors in predictions() of clm() with scale effects #718

Closed
DrJerryTAO opened this issue Mar 9, 2023 · 12 comments
Closed

Comments

@DrJerryTAO
Copy link

DrJerryTAO commented Mar 9, 2023

Hi @vincentarelbundock, I examined {marginaleffects} with ordinal::clm objects and found incorrect SE estimation when clm() has both location and scale effects from the same variable. See the mtcar example below:

summary(Model <- clm(
  cyl ~ hp + vs,  # vs has a location effect
  scale = ~ vs,    # vs also has a scale effect
  data = mtcars %>% mutate(cyl = factor(cyl))))
coef(summary(Model))
"       Estimate Std. Error   z value   Pr(>|z|)
4|6  7.80828861 4.54424125  1.718282 0.08574516
6|8 12.08049359 6.44711009  1.873784 0.06096016
hp   0.08356783 0.04132201  2.022356 0.04313955
vs  -1.34575405 1.34186188 -1.002901 0.31590883
vs  -0.79680133 0.79581889 -1.001235 0.31671345"
sapply(predict( # SE of Pr{cyl == 4} for the first five cases, using predict.clm()
  Model, newdata = mtcars[ , -2], se.fit = T, interval = T), 
  function(x) x[1:5, 1])
"          fit      se.fit          lwr       upr
1 0.200339726 0.205575732 1.985535e-02 0.7560016
2 0.200339726 0.205575732 1.985535e-02 0.7560016
3 0.955485126 0.086301642 2.869090e-01 0.9991275
4 0.478705739 0.225074420 1.355227e-01 0.8432391
5 0.001094808 0.003430555 2.342441e-06 0.3389789"
deltamethod(Model) # NA in scale coef SE if also in location coef
" Term Estimate Std. Error     z Pr(>|z|)    2.5 % 97.5 %
   b1   7.8083     4.5442  1.72   0.0857 -1.09826 16.715
   b2  12.0805     6.4471  1.87   0.0610 -0.55561 24.717
   b3   0.0836     0.0413  2.02   0.0431  0.00258  0.165
   b4  -1.3458     0.7958 -1.69   0.0908 -2.90553  0.214
   b5  -0.7968         NA    NA       NA       NA     NA"
predictions(Model)[1:5, ] # drastically different SE on row 3, 4
" Group Estimate Std. Error     z Pr(>|z|)    2.5 %  97.5 %
     4  0.20034    0.20555 0.975   0.3297 -0.20253 0.60321
     4  0.20034    0.20555 0.975   0.3297 -0.20253 0.60321
     4  0.95549    0.37260 2.564   0.0103  0.22520 1.68577
     4  0.47871    0.58861 0.813   0.4161 -0.67495 1.63237
     4  0.00109    0.00343 0.320   0.7493 -0.00562 0.00781"
predictions(Model, newdata = mtcars)[1:5, ] # same as above
predictions(Model, newdata = mtcars[ , -2])[1:5, ] # same as above
mtcars[1:5, -2] # Those SE differences correspond to vs = 1 (vs. vs = 0) rows
"                   mpg disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7  360 175 3.15 3.440 17.02  0  0    3    2"

It shows that predictions() result in very different SE of probability mass than those produced by predict() when scale effects are different from exp(0) = 1. Such large differences are not caused by delta method approximation or SE of different estimands. I suspect that the SE extractor in predictions() cannot save the vcov() information completely when one or more of scale-effect variables share an identical term name with some location-effect variables. The following update shows that variable naming is exactly the cause.

Model <- update(
  Model, scale = ~ vs1, data = mtcars %>% 
    mutate(cyl = factor(cyl), vs1 = vs))
predictions(Model)[1:5, ]
"Error: Unable to compute predicted values with this model. You can try
  to supply a different dataset to the `newdata` argument. This
  error was also raised:
  
  object 'vs1' not found"
predictions( # This time SE is close to those from predict.clm()
  Model, newdata = mtcars %>% mutate(vs1 = vs))[1:5, ]
" Group Estimate Std. Error      z Pr(>|z|)    2.5 %  97.5 %  mpg disp
     4  0.20034    0.20555  0.975   0.3297 -0.20253 0.60321 21.0  160
     4  0.20034    0.20555  0.975   0.3297 -0.20253 0.60321 21.0  160
     4  0.95549    0.08630 11.071   <0.001  0.78634 1.12463 22.8  108
     4  0.47871    0.22507  2.127   0.0334  0.03757 0.91984 21.4  258
     4  0.00109    0.00343  0.320   0.7493 -0.00562 0.00781 18.7  360"

Once location and scale effect terms are named uniquely, the disparity between predictions() and predict() is gone. I think an easy fix from the user's side is to copy and rename a variable, so that one variable in the scale effect has a different name from its counterpart in the location effect although both have identical values.

The above example also reflects a data frame building issue that the default predictions() cannot correctly retrieve variables used in scale = ~ ... and nominal = ~ ... if they are not present in the regular formula. Explicitly specifying newdata = ... resolves this "object not found" error. If a term is in both location and scale formula, then default data-frame building is okay, but the mistaken SE problem in predictions() emerges. I checked and confirmed that common term names in nominal = ~ ... and formula = ~ ... won't cause a problem in predictions(), as clm() will prefix variable names in nominal = ~ ... with threshold names.

Will you fix the variable naming scheme in predictions(), so it treats a variable in formula = ~ ..., scale = ~ ..., and nominal = ~ ... separately despite common term names? Maybe add a suffix or prefix. Or throw an error if multiple formula term labels conflict. The current predictions() output in such a scenario without a warning or error message can be misleading if the user takes the output for granted without realizing possible mistakes. As the example above shows, the SE disparity and inference consequence is not trivial.

Furthermore, the interval estimates from predictions() are very different from predict(), as the former uses normal approximation while the latter derive an SE on the linear predictor and transform back to probability intervals. Therefore, some probability CI made by predictions() is far outside of [0, 1]. Is it possible for predictions() to give asymmetrical confidence intervals based on the model link and a linear predictor? The current transform = function() argument in predictions() cannot salvage the situation: It transforms the already generated normal CI (which could be outside of 0 to 1) instead of transforming the estimate before building a CI.

@DrJerryTAO DrJerryTAO changed the title Incorrect standard errors in predictions of clm() with scale effects Incorrect standard errors in predictions() of clm() with scale effects Mar 10, 2023
vincentarelbundock added a commit that referenced this issue Mar 10, 2023
@vincentarelbundock
Copy link
Owner

vincentarelbundock commented Mar 10, 2023

Thanks a lot for the detailed example. I really appreciate it.

There are two issues. First, insight does not return the names of variables in the scale argument:

library(ordinal)
library(marginaleffects)

dat <- transform(mtcars, cyl = factor(cyl), vs2 = vs)
mod <- clm(cyl ~ hp, scale = ~ vs, data = dat)

insight::find_variables(mod)
# $response
# [1] "cyl"
# 
# $conditional
# [1] "hp"

This is a bug which should probably be reported to the insight repository.

The workaround in marginaleffects is to feed the data explicitly to the newdata argument. Using that trick and the latest development version from Github:

packageVersion("marginaleffects")
# [1] '0.11.0.9000'

We now get:

# removing the response from the prediction dataset forces `predict.clm()` to return all levels
nd <- subset(dat, select = -cyl)

mod1 <- clm(cyl ~ hp + vs, scale = ~ vs, data = dat)
mod2 <- clm(cyl ~ hp + vs, scale = ~ vs2, data = dat)

pre1 <- predictions(mod1, newdata = nd)
pre2 <- predictions(mod2, newdata = nd)
pre3 <- predict(mod1, newdata = nd, type = "prob", se.fit = TRUE)

# identical results
subset(pre1, rowid %in% 1:2)
# 
#  Group Estimate Std. Error     z Pr(>|z|) CI low CI high
#      4   0.2003      0.206 0.975  0.32973 -0.203   0.603
#      4   0.2003      0.206 0.975  0.32973 -0.203   0.603
#      6   0.7469      0.239 3.130  0.00175  0.279   1.215
#      6   0.7469      0.239 3.130  0.00175  0.279   1.215
#      8   0.0527      0.104 0.507  0.61214 -0.151   0.257
#      8   0.0527      0.104 0.507  0.61214 -0.151   0.257
# 
# Columns: rowid, group, estimate, std.error, statistic, p.value, conf.low, conf.high, mpg, disp, hp, drat, wt, qsec, vs, am, gear, carb, vs2, cyl

subset(pre2, rowid %in% 1:2)
# 
#  Group Estimate Std. Error     z Pr(>|z|) CI low CI high
#      4   0.2003      0.206 0.975  0.32973 -0.203   0.603
#      4   0.2003      0.206 0.975  0.32973 -0.203   0.603
#      6   0.7469      0.239 3.130  0.00175  0.279   1.215
#      6   0.7469      0.239 3.130  0.00175  0.279   1.215
#      8   0.0527      0.104 0.507  0.61214 -0.151   0.257
#      8   0.0527      0.104 0.507  0.61214 -0.151   0.257
# 
# Columns: rowid, group, estimate, std.error, statistic, p.value, conf.low, conf.high, mpg, disp, hp, drat, wt, qsec, vs, am, gear, carb, vs2, cyl

head(pre3$fit, 2)
#           4         6          8
# 1 0.2003397 0.7469119 0.05274834
# 2 0.2003397 0.7469119 0.05274834

head(pre3$se.fit, 2)
#           4         6         8
# 1 0.2055757 0.2387547 0.1041193
# 2 0.2055757 0.2387547 0.1041193

@DrJerryTAO
Copy link
Author

DrJerryTAO commented Mar 10, 2023

Hi @vincentarelbundock, using your updated development version, I can confirm that the SE calculation is corrected even when scale = ~ ... contains common terms with those in the regular formula = ~ ...

However, the associated hypotheses() could also get an update in terms of the variable naming scheme: Currently, whenever scale = ~ ... shares a common term name with one in formula = ~ ... AND nominal = ~ ... shares a common term with one in formula = ~ ... , the SE and all inferential statistics of the scale effects in scale = ~ ... are NA. See the example below

summary(Model <- clm( # nominal + scale (also in location) effects
  cyl ~ vs + hp, scale = ~ vs, nominal = ~ hp, 
  data = mtcars %>% mutate(cyl = factor(cyl))))
coef(summary(Model))
"                     Estimate   Std. Error     z value   Pr(>|z|)
4|6.(Intercept)  1.111073e+05 1.841641e+06  0.06033059 0.95189234
6|8.(Intercept)  9.176839e+00 5.674996e+00  1.61706527 0.10586419
4|6.hp          -1.220842e+03 2.023763e+04 -0.06032534 0.95189652
6|8.hp          -6.522225e-02 3.608747e-02 -1.80733775 0.07070963
vs              -2.279935e+04 3.780999e+05 -0.06029979 0.95191687
hp                         NA           NA          NA         NA
vs               8.617176e+00 1.657530e+01  0.51988047 0.60314689"
hypotheses(Model) # Note the last row has only point est
" Term  Estimate Std. Error       z Pr(>|z|)     2.5 %   97.5 %
   b1  1.11e+05   1.84e+06  0.0603   0.9519 -3.50e+06 3.72e+06
   b2  9.18e+00   5.67e+00  1.6171   0.1059 -1.95e+00 2.03e+01
   b3 -1.22e+03   2.02e+04 -0.0603   0.9519 -4.09e+04 3.84e+04
   b4 -6.52e-02   3.61e-02 -1.8073   0.0707 -1.36e-01 5.51e-03
   b5 -2.28e+04   3.78e+05 -0.0603   0.9519 -7.64e+05 7.18e+05
   b6        NA         NA      NA       NA        NA       NA
   b7  8.62e+00         NA      NA       NA        NA       NA"
hypotheses(Model, hypothesis = "b7 = 8")
"   Term Estimate Std. Error  z Pr(>|z|) 2.5 % 97.5 %
 b7 = 8    0.617         NA NA       NA    NA     NA"

The error disappears once the redundant term is removed from the regular formula = ~ ... if it is already in the nominal = ~ ...

summary(Model <- clm( # identical model as above; hp removed from regular formula
  cyl ~ vs, scale = ~ vs, nominal = ~ hp, 
  data = mtcars %>% mutate(cyl = factor(cyl))))
coef(summary(Model)) # None changed, only the aliased hp row removed
"                     Estimate   Std. Error     z value   Pr(>|z|)
4|6.(Intercept)  1.111073e+05 1.841641e+06  0.06033059 0.95189234
6|8.(Intercept)  9.176839e+00 5.674996e+00  1.61706527 0.10586419
4|6.hp          -1.220842e+03 2.023763e+04 -0.06032534 0.95189652
6|8.hp          -6.522225e-02 3.608747e-02 -1.80733775 0.07070963
vs              -2.279935e+04 3.780999e+05 -0.06029979 0.95191687
vs               8.617176e+00 1.657530e+01  0.51988047 0.60314689"
hypotheses(Model) # Now SE of scale vs is fine
" Term  Estimate Std. Error       z Pr(>|z|)     2.5 %   97.5 %
   b1  1.11e+05   1.84e+06  0.0603   0.9519 -3.50e+06 3.72e+06
   b2  9.18e+00   5.67e+00  1.6171   0.1059 -1.95e+00 2.03e+01
   b3 -1.22e+03   2.02e+04 -0.0603   0.9519 -4.09e+04 3.84e+04
   b4 -6.52e-02   3.61e-02 -1.8073   0.0707 -1.36e-01 5.51e-03
   b5 -2.28e+04   3.78e+05 -0.0603   0.9519 -7.64e+05 7.18e+05
   b6  8.62e+00   1.66e+01  0.5199   0.6031 -2.39e+01 4.11e+01"
hypotheses(Model, hypothesis = "b6 = 8")
"   Term Estimate Std. Error      z Pr(>|z|) 2.5 % 97.5 %
 b6 = 8    0.617       16.6 0.0372     0.97 -31.9   33.1"

@DrJerryTAO
Copy link
Author

#99 Just a thought: Could this variable naming issue the cause for malfunction of predictions() with the lme() model? If a data frame is supplied to the newdata = ... argument in predictions(), maybe it resolves some issues.

@DrJerryTAO
Copy link
Author

I opened an issue reporting variable-left-out in {insight} easystats/insight#727

@vincentarelbundock
Copy link
Owner

vincentarelbundock commented Mar 10, 2023

#99 Just a thought: Could this variable naming issue the cause for malfunction of predictions() with the lme() model? If a data frame is supplied to the newdata = ... argument in predictions(), maybe it resolves some issues.

In principle this could work, but I do not want to add support for a model if it requires hacks on the user's part. To add a model, I'd like lme, I want "full" support.

Re: hypotheses() missing values

I believe that this is because your model is ill-defined. See the warning and message about “1 not defined because of singularities”. For internal reasons, it would be very tricky for me to work around this problem. It is much easier for the user to fit a proper model instead.

library(ordinal)
mod <- clm(
  cyl ~ vs + hp, scale = ~ vs, nominal = ~ hp, 
  data = transform(mtcars, cyl = factor(cyl)))
# Warning: (-3) not all thresholds are increasing: fit is invalid 
# In addition: Absolute and relative convergence criteria were met
summary(mod)
# formula: cyl ~ vs + hp
# scale:   ~vs
# nominal: ~hp
# data:    transform(mtcars, cyl = factor(cyl))
# 
#  link  threshold nobs logLik AIC   niter  max.grad cond.H 
#  logit flexible  32   -8.01  28.02 95(20) 1.10e-13 1.0e+17
# 
# Coefficients: (1 not defined because of singularities)
#    Estimate Std. Error z value Pr(>|z|)
# vs   -22799     378100   -0.06    0.952
# hp       NA         NA      NA       NA
# 
# log-scale coefficients:
#    Estimate Std. Error z value Pr(>|z|)
# vs    8.617     16.575    0.52    0.603
# 
# Threshold coefficients:
#                   Estimate Std. Error z value
# 4|6.(Intercept)  1.111e+05  1.842e+06   0.060
# 6|8.(Intercept)  9.177e+00  5.675e+00   1.617
# 4|6.hp          -1.221e+03  2.024e+04  -0.060
# 6|8.hp          -6.522e-02  3.609e-02  -1.807

@DrJerryTAO
Copy link
Author

I believe that this is because your model is ill-defined. See the warning and message about “1 not defined because of singularities”. For internal reasons, it would be very tricky for me to work around this problem. It is much easier for the user to fit a proper model instead.

No, the reason for NAs in hypotheses() missing SEs is exactly a variable-naming issue. For a model without such warnings "(-3) not all thresholds are increasing," hypotheses() will still have missing SEs for the scale = ~ ... coefficients if nominal = ~ ... contains some variables already in the regular formula.

@vincentarelbundock
Copy link
Owner

Please show me an example of a model where coef(model) does not return any NA value, and which hypotheses() fails on. So far, the failures are in models with NA coefficients.

@DrJerryTAO
Copy link
Author

DrJerryTAO commented Mar 12, 2023

Will you fix the variable naming scheme in predictions(), so it treats a variable in formula = ~ ..., scale = ~ ..., and nominal = ~ ... separately despite common term names? Maybe add a suffix or prefix.

I take back my suggestion above, as adding "a suffix or prefix" to variable names in a data frame is a dangerous move. If formula = ~ ..., scale = ~ ..., and nominal = ~ ... share some common variables, such as

summary(Model <- clm( # nominal + scale (also in location) effects
  cyl ~ vs + hp, scale = ~ vs, nominal = ~ hp, 
  data = mtcars %>% mutate(cyl = factor(cyl))))

When making adjusted predictions, we do want to ensure that if vs = 0 becomes vs = 1 when evaluated in regular formula = ~ ..., the same change happens in scale = ~ ... and nominal = ~ ... Therefore, assigning different names to the same variable in different formula components (e.g. name vs s.vs in scale = ~ ...) during datagrid() building is unwise and results in mistakes.

Could you confirm that the latest version {marginaleffects} builds counterfactual data correctly, so an incremental change in a variable takes effect in all terms using the variable in all formula components? (However, this behavior is undesired for some other packages like mlogit, where an incremental change in a variable should take different schemes in different formula components, as described in #719)

@DrJerryTAO
Copy link
Author

DrJerryTAO commented Mar 12, 2023

Hi @vincentarelbundock,

Please show me an example of a model where coef(model) does not return any NA value, and which hypotheses() fails on. So far, the failures are in models with NA coefficients.

There will be no examples that match the occasion you described because coef(clm()) will have NA for point estimate, SE, and so on of an "aliased" coefficient in the location effect (term in regular formula = ~ ...) if nominal = ~ ... contains the same term. These NA are not due to convergence or singularity but because the aliased coefficient will be redundant or necessarily 0. This is like the case when using K dummy variables to represent a K-category factor in lm(). The coef() will produce NA for the last dummy variable due to redundancy.

summary(Model <- lm( # okay to have a rank-deficient model matrix
  mpg ~ I(vs == 0) + I(vs == 1), data = mtcars))
"Coefficients: (1 not defined because of singularities)
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)      24.557      1.224  20.058  < 2e-16 ***
I(vs == 0)TRUE   -7.940      1.632  -4.864 3.42e-05 ***
I(vs == 1)TRUE       NA         NA      NA       NA    "
hypotheses(Model, "b3 = 0") # correct
"   Term Estimate Std. Error  z Pr(>|z|) 2.5 % 97.5 %
 b3 = 0       NA         NA NA       NA    NA     NA"

The problem with hypotheses() is NOT on coef() rows with NA: In certain scenarios (nominal = ~ ... shares some term with formula = ~ ..., so some coefficients in formula = ~ ... will be NA, AND scale = ~ ... shares some term with formula = ~ ...), all inferential statistics like SE are NA (but point estimate is retained) for coefficients in scale = ~ ... ONLY, when coef(summary(clm())) won't have such a problem. My previous examples in #718 (comment) show that the strange behavior of hypotheses() is simply due to variable naming.

Bad news is that with the latest development version, I discovered that such a strange SE behavior is not simply a display issue in hypotheses() but spills over to predictions() (See the test example below) and comparisons() (I suspect).

summary(Model <- clm( # nominal + scale (also in location) effects
  cyl ~ vs + carb, scale = ~ vs, nominal = ~ carb, 
  data = mtcars %>% mutate(cyl = factor(cyl))))
coef(summary(Model))
"                  Estimate Std. Error    z value    Pr(>|z|)
4|6.(Intercept) -2.5494136  1.0727656 -2.3764871 0.017478373
6|8.(Intercept) -2.5571012  1.7069284 -1.4980717 0.134114621
4|6.carb        -0.0981885  0.3095422 -0.3172055 0.751087646
6|8.carb         0.3369959  0.3950073  0.8531385 0.393582510
vs              -2.8264950  1.0428324 -2.7104020 0.006720171
carb                    NA         NA         NA          NA
vs              -2.3381336  3.0079436 -0.7773196 0.436970217"
hypotheses(Model) # Note the last row has only point est
" Term Estimate Std. Error      z Pr(>|z|)  2.5 % 97.5 %
   b1  -2.5494      1.073 -2.376  0.01748 -4.652 -0.447
   b2  -2.5571      1.707 -1.498  0.13411 -5.903  0.788
   b3  -0.0982      0.310 -0.317  0.75109 -0.705  0.509
   b4   0.3370      0.395  0.853  0.39358 -0.437  1.111
   b5  -2.8265      1.043 -2.710  0.00672 -4.870 -0.783
   b6       NA         NA     NA       NA     NA     NA
   b7  -2.3381         NA     NA       NA     NA     NA"
hypotheses(Model, hypothesis = "b7 = -2") # SE is NA, but should be 3.01
"    Term Estimate Std. Error  z Pr(>|z|) 2.5 % 97.5 %
 b7 = -2   -0.338         NA NA       NA    NA     NA"
sapply(predict( # Pr{cyl == 4} for first five cars, using predict.clm()
  Model, newdata = mtcars[ , -2], se.fit = T, interval = T), 
  function(x) x[1:5, 1])
"         fit     se.fit         lwr       upr
1 0.05010800 0.05846027 0.004728286 0.3693789
2 0.05010800 0.05846027 0.004728286 0.3693789
3 0.86455708 0.10809045 0.511116106 0.9749829
4 0.86455708 0.10809045 0.511116106 0.9749829
5 0.06032482 0.05514626 0.009447379 0.3017337"
predictions(Model, newdata = mtcars)[1:5, ] # Very different SE on row 3, 4
" Group Estimate Std. Error     z Pr(>|z|)   2.5 % 97.5 %  mpg disp  hp
     4   0.0501     0.0585 0.857    0.391 -0.0645  0.165 21.0  160 110
     4   0.0501     0.0585 0.857    0.391 -0.0645  0.165 21.0  160 110
     4   0.8646     0.6713 1.288    0.198 -0.4511  2.180 22.8  108  93
     4   0.8646     0.6713 1.288    0.198 -0.4511  2.180 21.4  258 110
     4   0.0603     0.0551 1.094    0.274 -0.0478  0.168 18.7  360 175"

Like the case in #718 (comment), my current resolution is to remove redundant terms "from the regular formula = ~ ... if it is already in the nominal = ~ ...". Once done, coef() has no aliased rows (no NA), and predictions() and hypotheses() work correctly.

summary(Model <- clm( # identical model as above
  cyl ~ vs, scale = ~ vs, nominal = ~ carb, 
  data = mtcars %>% mutate(cyl = factor(cyl))))
coef(summary(Model)) # None changed, only the aliased carb row removed
"                  Estimate Std. Error    z value    Pr(>|z|)
4|6.(Intercept) -2.5494136  1.0727656 -2.3764871 0.017478373
6|8.(Intercept) -2.5571012  1.7069284 -1.4980717 0.134114621
4|6.carb        -0.0981885  0.3095422 -0.3172055 0.751087646
6|8.carb         0.3369959  0.3950073  0.8531385 0.393582510
vs              -2.8264950  1.0428324 -2.7104020 0.006720171
vs              -2.3381336  3.0079436 -0.7773196 0.436970217"
hypotheses(Model) # Now SE of scale = ~ vs is fine
" Term Estimate Std. Error      z Pr(>|z|)  2.5 % 97.5 %
   b1  -2.5494      1.073 -2.376  0.01748 -4.652 -0.447
   b2  -2.5571      1.707 -1.498  0.13411 -5.903  0.788
   b3  -0.0982      0.310 -0.317  0.75109 -0.705  0.509
   b4   0.3370      0.395  0.853  0.39358 -0.437  1.111
   b5  -2.8265      1.043 -2.710  0.00672 -4.870 -0.783
   b6  -2.3381      3.008 -0.777  0.43697 -8.234  3.557"
hypotheses(Model, hypothesis = "b6 = -2") # Now SE = 3.01 as expected
"    Term Estimate Std. Error      z Pr(>|z|) 2.5 % 97.5 %
 b6 = -2   -0.338       3.01 -0.112     0.91 -6.23   5.56"
sapply(predict(
  Model, newdata = mtcars[ , -2], se.fit = T, interval = T), 
  function(x) x[1:5, 1]) # same as before
"         fit     se.fit         lwr       upr
1 0.05010800 0.05846027 0.004728286 0.3693789
2 0.05010800 0.05846027 0.004728286 0.3693789
3 0.86455708 0.10809045 0.511116106 0.9749829
4 0.86455708 0.10809045 0.511116106 0.9749829
5 0.06032482 0.05514626 0.009447379 0.3017337"
predictions(Model, newdata = mtcars)[1:5, ] # Same as predict()
" Group Estimate Std. Error     z Pr(>|z|)   2.5 % 97.5 %  mpg disp  hp
     4   0.0501     0.0585 0.857    0.391 -0.0645  0.165 21.0  160 110
     4   0.0501     0.0585 0.857    0.391 -0.0645  0.165 21.0  160 110
     4   0.8646     0.1081 7.999   <0.001  0.6527  1.076 22.8  108  93
     4   0.8646     0.1081 7.999   <0.001  0.6527  1.076 21.4  258 110
     4   0.0603     0.0551 1.094    0.274 -0.0478  0.168 18.7  360 175""

Would you update the SE calculation to account for possibilities that formula = ~ ... hold redundant coefficients with NA if the same term is already in the nominal = ~ ..." ? As it is also a variable naming issue, it should be a similar approach to what you did in #718 (comment).

vincentarelbundock added a commit that referenced this issue Mar 12, 2023
@vincentarelbundock
Copy link
Owner

@MrJerryTAO can you try the latest version from Github?

@DrJerryTAO
Copy link
Author

DrJerryTAO commented Mar 12, 2023

I just did, using the examples in #718 (comment). I found that in the latest version of marginaleffects the SEs in scale effects look correct, even with redundant terms in the formula with aliased NA coefficients. Thanks, @vincentarelbundock. Now I feel safe describing the marginal effects in my dissertation :)

@vincentarelbundock
Copy link
Owner

Yay!!

Tristan-Siegfried pushed a commit to Tristan-Siegfried/marginaleffects that referenced this issue Sep 21, 2023
Tristan-Siegfried pushed a commit to Tristan-Siegfried/marginaleffects that referenced this issue Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants