Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for models from rstpm2 #1041

Closed
aghaynes opened this issue Mar 8, 2024 · 11 comments
Closed

support for models from rstpm2 #1041

aghaynes opened this issue Mar 8, 2024 · 11 comments

Comments

@aghaynes
Copy link

aghaynes commented Mar 8, 2024

Hey, great package!

Would it be possible to add support for models from the rstpm2 package? This allows things like RMST from flexible parametric survival models...

library(rstpm2)

data(brcancer)

m <- stpm2(Surv(rectime, censrec == 1) ~ hormon * x3, data = brcancer, df = 3)

predict(m, type = "rmst",
        newdata = expand.grid(rectime = 1000,
                              hormon = c(0,1,2),
                              x3 = 50),
        var = "hormon", se.fit = TRUE)
predict(m, type = "rmstdiff",
        newdata = expand.grid(rectime = 1000,
                              hormon = c(0,1,2),
                              x3 = 50),
        var = "hormon", se.fit = TRUE)

It would be great to be able to marginalise over e.g. x3...

@vincentarelbundock
Copy link
Owner

I gave this a shot but can't get the standard errors to work. I'm optimistic that this is possible, but don't have more time to invest at the moment.

If anyone wants to give this a shot, the tutorial on extensions is here: https://marginaleffects.com/vignettes/extensions.html#support-a-new-model-type

Below, I paste my initial attempts. I think that everything works, except that the output of get_predict() is not altered when we change the coefficients in set_coef(). So the problem is likely in that latter function.

options("marginaleffects_model_classes" = "stpm2")


#' @rdname set_coef
#' @export
set_coef.stpm2 <- function(model, coefs, ...) {
    insight::check_if_installed("rstpm2")
    model@coef <- coefs
    model
}


#' @rdname get_vcov
#' @export
get_vcov.stpm2 <- function(model, ...) {
    insight::check_if_installed("rstpm2")
    rstpm2::vcov(model)
}


#' @rdname get_predict
#' @export
get_predict.stpm2 <- function(model, newdata = NULL, ...) {
    insight::check_if_installed("rstpm2")
    pred <- rstpm2::predict(model, newdata = newdata)
    sanity_predict_vector(pred = pred, model = model, newdata = newdata)
    sanity_predict_numeric(pred = pred, model = model, newdata = newdata)
    out <- data.frame(
        rowid = 1:nrow(newdata),
        estimate = pred)
    return(out)
}

@vincentarelbundock
Copy link
Owner

I am very interested in supporting this. But just to keep the repo clean, I close individual model request to consolidate them in a list here: #49

@vincentarelbundock
Copy link
Owner

@aghaynes writes

I'm not sure whether conversation on the rstpm2 implementation should be here or in the other issue... I guess here, as this is the open one?

In stpm2 models, the coefficients seem to be stored in many places...

Using the following at least produces the SEs etc in the predictions output... (i dont know which are important so i changed them all)

set_coef.stpm2 <- function(model, coefs, ...) {
  insight::check_if_installed("rstpm2")
  model@coef <- coefs
  model@fullcoef <- coefs
  model@lm$coefficients <- coefs
  model@args$init <- unlist(coefs)
  model@details$par <- coefs
  model
}

> predictions(m2, newdata = nd)

 Estimate Std. Error    z Pr(>|z|)     S 2.5 % 97.5 % rectime hormon x3
    0.535     0.0387 13.8   <0.001 142.1 0.459  0.611    1000      0 50
    0.647     0.0411 15.8   <0.001 183.3 0.567  0.728    1000      1 50
    0.739     0.0689 10.7   <0.001  86.7 0.604  0.874    1000      2 50

Columns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, rectime, hormon, x3 
Type:  response 

avg_predictions doesnt work though for some reason... if the missing variables were the spline terms, I might have kinda understood, but that hormon is missing confuses me... it seems to give the same error whether newdata is passed or not

avg_predictions(m2, var = "hormon")
Error: Unable to compute predicted values with this model. You can try to supply a different
  dataset to the `newdata` argument. This error was also raised:
  
  undefined columns selected
  
  Bug Tracker: https://github.com/vincentarelbundock/marginaleffects/issues
In addition: Warning message:
Some of the variable names are missing from the model data: hormon 

to note: rstpm2 can predict a whole load of effect measures... so the type_dictionary would need updating to include some of them (i would be most interested in rmst and rmstdiff, but others are certainly also of interest)

@vincentarelbundock
Copy link
Owner

This is interesting, but I worry about substituting all those coef values. Ideally, we'd want a better understanding of the model object before starting to change things willy-nilly. Also, can this package use other than lm objects? What happens in those cases?

My guess is that get_variables() from the insight package does not extract all the predictors from this model when there are splines. They are usually very responsive, so this could be reported to them.

@aghaynes
Copy link
Author

Sure.. i just did it to see if it worked. most likely only one or two are actually relevant (perhaps reaching out to the author is worthwhile). I dont know the inner workings of the package unfortunately. I had a quick look through the helpfiles and dont see anything about using model types besides lms. It could be that there's something else used in the AFT or MSM machinery though i guess.

@vincentarelbundock
Copy link
Owner

Another way would be to look at the code for the predict() method in the package to see what it uses.

@aghaynes
Copy link
Author

@vincentarelbundock
Copy link
Owner

the next approx 700 lines

lol

@aghaynes
Copy link
Author

to be fair, it's because there are so many things it can predict...

there are multiple places where it uses coef(object) and an S3 method coef.stpm2 <- coef where coef seems to be imported from stats. but it can't be as simple as that, otherwise your original code should have worked, right?

@mclements mclements mentioned this issue Aug 9, 2024
@vincentarelbundock
Copy link
Owner

@aghaynes thanks to the great work of @mclements this is now available in the Github version of marginaleffects. I plan a release to CRAN in the next few weeks.

Thanks again for the feature request.

@aghaynes
Copy link
Author

That is great news! Thanks to both of you!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants