-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update tests after feature selection change #1213
Update tests after feature selection change #1213
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sklearnex/preview/ensemble/forest
wrappers should be updated too
daal4py/sklearn/ensemble/_forest.py
Outdated
maxLeafNodes=0 if self.max_leaf_nodes is None else self.max_leaf_nodes, | ||
maxBins=self.maxBins, | ||
minBinSize=self.minBinSize, | ||
useConstFeatures=self.useConstFeatures, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
daal4py wrappers for previous oneDAL versions don't have useConstFeatures
arg. Use daal_check_version
for branching by oneDAL versions
@@ -1,127 +1,127 @@ | |||
36.70242652 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the need for changing data file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the PR description. Node splitting will be changed with uxlfoundation/oneDAL#2292, resulting in different values. As far as I can tell, the algorithm is still producing correct results.
49b9f1f
to
6ef0a5d
Compare
6ef0a5d
to
15b1890
Compare
CI is based on previous oneDAL release so unittest file should be dispatched based on oneDAL version for green CI |
details: feature selection for node splitting was changed which results in different numerics for the prediction. mean and variance are still in good agreement mean: old -> 22.088 new -> 22.104 variance: old -> 49.4695 new -> 49.4311
629c9eb
to
99ef708
Compare
@@ -33,7 +33,12 @@ | |||
|
|||
ACCURACY_RATIO = 0.95 if daal_check_version((2021, 'P', 400)) else 0.85 | |||
MSE_RATIO = 1.07 | |||
LOG_LOSS_RATIO = 1.4 if daal_check_version((2021, 'P', 400)) else 1.55 | |||
if daal_check_version((2023, 'P', 101)): | |||
LOG_LOSS_RATIO = 1.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What have happened?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is discussed in uxlfoundation/oneDAL#2292 and I am investigating the issue. I think my changes pronounce an existing bug for one particular test case even more. But I really would like to tackle speedup and accuracy in separate PRs
99ef708
to
cc22197
Compare
cc22197
to
b6ba186
Compare
'decision_forest_regression_batch.csv', lambda r: r[1].prediction, (2023, 'P', 1)), | ||
('decision_forest_regression_hist_batch', | ||
'decision_forest_regression_batch.csv', lambda r: r[1].prediction, (2023, 'P', 1)), | ||
('decision_forest_regression_default_dense_batch', | ||
'decision_forest_regression_batch_20230101.csv', | ||
lambda r: r[1].prediction, (2023, 'P', 101)), | ||
('decision_forest_regression_hist_batch', | ||
'decision_forest_regression_batch_20230101.csv', | ||
lambda r: r[1].prediction, (2023, 'P', 101)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This set of examples will fail when version updates from 2023.1.0 to 2023.1.1 because two sets of regression examples will be run.
With uxlfoundation/oneDAL#2292 some changes are introduced that need to be reflected on the scikit-learn-intelex side
useConstFeatures
algorithm optionfalse
Edit:
During development I ran a
RandomForestClassifier
and I'm still producing similar resultsHowever, the training time is greatly improved