Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using EvoTrees.jl for multi-target regression problems #217

Open
fipelle opened this issue Mar 27, 2023 · 3 comments
Open

Using EvoTrees.jl for multi-target regression problems #217

fipelle opened this issue Mar 27, 2023 · 3 comments
Labels

Comments

@fipelle
Copy link

fipelle commented Mar 27, 2023

Hi,

Q1: I have seen that this package supports multi-class problems. I was wondering if there is also a way to use it for multi-target regression problems. For instance, if you would like to predict two variables and using some multivariate squared error loss (e.g., the average MSE over the targets). I have tried setting y_train to be a Vector{Vector{Float64}} but it errors out in fit.jl:53 using:

config = EvoTreeRegressor(
    loss=:linear, 
    nrounds=100, 
    nbins=100,
    lambda=0.5, 
    gamma=0.1, 
    eta=0.1, 
    max_depth=6, 
    min_weight=1.0, 
    rowsample=0.5, 
    colsample=1.0);

m = fit_evotree(config; x_train=predictors, y_train=targets)

Q2: Is it possible to use custom loss functions for multi-target regression problems provided that they are twice differentiable?

Thanks!

@fipelle fipelle changed the title Using EvoTrees.jl APIs for Bagging and Random Forests Using EvoTrees.jl for multi-target regression problems Mar 27, 2023
@fipelle
Copy link
Author

fipelle commented Mar 27, 2023

This is an example with random forests that shows something similar in ScikitLearn.

@jeremiedb
Copy link
Member

jeremiedb commented Mar 27, 2023

Hello!

  1. Multi-output: there's unfortunately no such multi-output support in place at the moment, so it's expected that passing a vector of y targets would faill (although it also signals improved assertion could be helpful in guiding users in their usage). It wouldn't consider this as trivial, but I think that such support could be reasonably implemented for simple "single target" loss functions such as "linear", "logisitic" and the likes. I haven't encountered the need for such multi-target so far, could be elaborate a bit on the features requirement? For instance, is there a need to for a weighted loss of each of the target or is straight average / sum sufficient?

  2. Custom loss function: there's no direct support for such, at least in the form of an API as XGBoost or LightGBM provides, even for the regular, single target objective. However, part of the interest for working in Julia is that the codebase is quite lightweight and adding a custom loss function is fairly trivial. If there's a loss function you'd like to see added, it's likely possible to integrate it in the library. Then, having it available for multi-target would depend on the development on Q1.

@fipelle
Copy link
Author

fipelle commented Mar 28, 2023

Hi,

I think that in general allowing for weighted losses would be better. In my case, I would need a simple average. In terms of use cases, there may be situations in which you'd like to predict a series of targets from the same set of features and model. For instance, this is somewhat common in economics and finance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants