Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

visualisation_recipe with transformed responses #151

Open
bwiernik opened this issue Oct 8, 2021 · 13 comments
Open

visualisation_recipe with transformed responses #151

bwiernik opened this issue Oct 8, 2021 · 13 comments
Labels
Enhancement 💥 Implemented features can be improved or revised Feature idea 🔥 New feature or request

Comments

@bwiernik
Copy link
Contributor

bwiernik commented Oct 8, 2021

Currently, the default plot produced by estimate_expected() with a transformed response variable isn't correct. The data points are in the raw metric, but the predicted line is in the transformed metric.

We should either:

  1. apply the same transformation to the data when plotting as used in the model specification
  2. include a transform in estimate_expectation() to allow back-transforming of predictions
library(modelbased)
library(ggplot2)

m_inline_trans <- lm(log(dist) ~ speed, data = cars)

estimate_expectation(m_inline_trans) |> 
  plot()

estimate_expectation(m_inline_trans) |> 
  dplyr::mutate(
    Predicted = exp(Predicted),
    SE = SE * exp(Predicted),
    CI_low = exp(CI_low),
    CI_high = exp(CI_high)
  ) |> 
  plot()

cars$log_dist <- log(cars$dist)
m_pre_trans <- lm(log_dist ~ speed, data = cars)

estimate_expectation(m_pre_trans) |>
  plot()

estimate_expectation(m_pre_trans) |>
  plot() +
  scale_y_continuous(trans = "exp")
#> Warning in self$trans$inverse(limits): NaNs produced
#> Error in if (zero_range(as.numeric(limits))) {: missing value where TRUE/FALSE needed

estimate_expectation(m_pre_trans) |>
  ggplot() +
  aes(x = speed) +
  geom_point(
    aes(y = log_dist), data = insight::get_data(m_pre_trans)
  ) +
  geom_ribbon(
    aes(ymin = CI_low, ymax = CI_high),
    alpha = .5
  ) +
  geom_line(
    aes(y = Predicted)
  ) 

Created on 2021-10-08 by the reprex package (v2.0.1)

@bwiernik
Copy link
Contributor Author

bwiernik commented Oct 9, 2021

We should add a transformation argument to estimate_*() functions, eg.:

library(modelbased)
library(ggplot2)

m_inline_trans <- lm(log(dist) ~ speed, data = cars)

estimate_expectation(m_inline_trans, transformation = exp) |> 
  plot()

Then internally, if transformation is given, we do, e.g., for .estimate_predicted():

out$Predicted <- transformation(out$Predicted)
out$CI_low <- transformation(out$CI_low)
out$CI_high <- transformation(out$CI_high)
out$Residuals <- transformation(out$Residuals)

trans_env <- as.environment(out)
trans_env$transformation <- transformation 
jacob <- as.numeric(stats::numericDeriv(quote(transformation(Predicted)), "Predicted", rho = trans_env))
out$SE <- out$SE * jacob

@bwiernik
Copy link
Contributor Author

@strengejacke Can you write a find_transformation() function for insight that extracts the functions used to transform a response variable in a formula (basically the opposite of insight:::.remove_pattern_from_names())? Ideally, it would be really cool if it would extract as an executable function and return identity if there is no transformation.

@strengejacke
Copy link
Member

Should this function only refer to the response variable, or also to a possible link-function?
(i.e. should log(y) and family = xy("log") both return the same thing, namely "log"?)

@bwiernik
Copy link
Contributor Author

I think we already handle link/inverse link plotting fine. So we only need to handle in-formula transformations with such a function

@strengejacke
Copy link
Member

And the return value should be a function, right? Should that function do the same transformation, or the inverse-transformation?

i.e. if formula = log(y), should find_transformation() return log() or exp()?

Furthermore, I suggest using get_transformation() to return a function, and find_transformation() to return the string representation? Does that make sense?

@bwiernik
Copy link
Contributor Author

Get/find makes sense.

How about get returns a list with $tranformation and $inverse slots

@strengejacke
Copy link
Member

I'm not that familiar with the code, maybe @DominiqueMakowski or @bwiernik can work in this? Would be great to have this resolved before an update is submitted to CRAN.

@DominiqueMakowski
Copy link
Member

I don't know how to "fix" that really, shouldn't all this transfo stuff be done at insight's level?

@strengejacke
Copy link
Member

strengejacke commented Nov 8, 2021

We could probably do this:

if (insight::find_transformation(model) != "identity") {
  transformation <- insight::get_transformation(model)$inverse
  out$Predicted <- transformation(out$Predicted)
  out$CI_low <- transformation(out$CI_low)
  out$CI_high <- transformation(out$CI_high)
  out$Residuals <- transformation(out$Residuals)

  trans_env <- as.environment(out)
  trans_env$transformation <- transformation 
  jacob <- as.numeric(stats::numericDeriv(quote(transformation(Predicted)), "Predicted", rho = trans_env))
  out$SE <- out$SE * jacob
}

However, I'm not sure about the numericDeriv-stuff, looks like the code above can be directly used.

@strengejacke
Copy link
Member

And I'm, not sure where (i.e. to which function) to put this code?

@bwiernik
Copy link
Contributor Author

bwiernik commented Nov 8, 2021

We should add the transformation argument to get_predicted() and add the call to find a model's inverse transformation inside .estimate_predicted(), which we pass to get_predicted()

@strengejacke
Copy link
Member

Ok, so get_predicted() gets a transform argument, which could be NULL or a function, and then in insight:::.get_predicted_out() we would apply that transformation?

@bwiernik
Copy link
Contributor Author

bwiernik commented Nov 8, 2021

Yep

@strengejacke strengejacke added Feature idea 🔥 New feature or request Enhancement 💥 Implemented features can be improved or revised labels Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement 💥 Implemented features can be improved or revised Feature idea 🔥 New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants