Feature importance p-values #10

lang-benjamin · 2023-12-15T09:04:41Z

In addition to the permutation-based feature importance, there is permutation-based p-values for the feature importance (Altmann, A., Tolosi, L., Sander, O. & Lengauer, T. (2010). Permutation importance: a corrected feature importance measure, Bioinformatics 26:1340-1347). There is essentially only the ranger package that implements this via the importance_pvalues function. Would you think that such a function is helpful? I could imagine that this may aid in judging whether a feature is relevant or not.

The text was updated successfully, but these errors were encountered:

brian-j-smith · 2023-12-16T19:58:14Z

Thanks for asking about permutation p-values. I hadn't thought about those before, but they can be calculated with the varimp() function. Below is an example of permutation-based calculations for variable importance followed by p-values. This method may differ from the Altmann et al. paper but is permutation-based nonetheless. Like variable importance, these p-values can computed for any model and with any appropriate performance metric supplied by the package.

# Load analytic packages
library(MachineShop)
library(ggplot2)

# Set up a parallel backend for faster permutations
library(doParallel)
registerDoParallel()

# Fit any MachineShop model
mdl_fit <- fit(sale_amount ~ ., data = ICHomes, model = GLMModel)

# Permutation variable importance

vi <- varimp(mdl_fit, samples = 1000)
plot(vi)

# Permutation p-values

## Custom varimp() stats function to compute permutation p-values
## Argument x is the difference between permuted and observed model performances
## for a variable
pval <- function(x) {
  c("pvalue" = min(2 * mean(x <= 0), 1))
}

## Call varimp() with the p-value function
permpval <- varimp(
  mdl_fit,
  scale = FALSE,
  samples = 1000,
  stats = pval
)
plot(permpval) + labs(y = "Permutation p-value")

lang-benjamin · 2023-12-22T19:11:07Z

Thank you for the comment. I really like the flexibility of the package.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature importance p-values #10

Feature importance p-values #10

lang-benjamin commented Dec 15, 2023

brian-j-smith commented Dec 16, 2023 •

edited

Loading

lang-benjamin commented Dec 22, 2023

Feature importance p-values #10

Feature importance p-values #10

Comments

lang-benjamin commented Dec 15, 2023

brian-j-smith commented Dec 16, 2023 • edited Loading

lang-benjamin commented Dec 22, 2023

brian-j-smith commented Dec 16, 2023 •

edited

Loading