-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature importance p-values #10
Comments
Thanks for asking about permutation p-values. I hadn't thought about those before, but they can be calculated with the # Load analytic packages
library(MachineShop)
library(ggplot2)
# Set up a parallel backend for faster permutations
library(doParallel)
registerDoParallel()
# Fit any MachineShop model
mdl_fit <- fit(sale_amount ~ ., data = ICHomes, model = GLMModel)
# Permutation variable importance
vi <- varimp(mdl_fit, samples = 1000)
plot(vi)
# Permutation p-values
## Custom varimp() stats function to compute permutation p-values
## Argument x is the difference between permuted and observed model performances
## for a variable
pval <- function(x) {
c("pvalue" = min(2 * mean(x <= 0), 1))
}
## Call varimp() with the p-value function
permpval <- varimp(
mdl_fit,
scale = FALSE,
samples = 1000,
stats = pval
)
plot(permpval) + labs(y = "Permutation p-value") |
Thank you for the comment. I really like the flexibility of the package. |
In addition to the permutation-based feature importance, there is permutation-based p-values for the feature importance (Altmann, A., Tolosi, L., Sander, O. & Lengauer, T. (2010). Permutation importance: a corrected feature importance measure, Bioinformatics 26:1340-1347). There is essentially only the
ranger
package that implements this via theimportance_pvalues
function. Would you think that such a function is helpful? I could imagine that this may aid in judging whether a feature is relevant or not.The text was updated successfully, but these errors were encountered: