Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test SelectKBest + StandardScaler pipeline #1156

Merged
merged 5 commits into from
Jan 24, 2025

Conversation

Pierre-Bartet
Copy link
Contributor

@Pierre-Bartet Pierre-Bartet commented Jan 22, 2025

When SelectKBest is used before a StandardScaler inside a pipeline with $k$ smaller than the number of features, the conversion fails:

model = make_pipeline(SelectKBest(k=3), StandardScaler(), LogisticRegression())

This PR fixes this problem by using the concat shape calculator for selectors and adding a correct one instead.

[("input", FloatTensorType([None, X.shape[1]]))],
target_opset=TARGET_OPSET,
)
self.assertTrue(model_onnx is not None)

Check notice

Code scanning / CodeQL

Imprecise assert Note test

assertTrue(a is not b) cannot provide an informative message. Using assertIsNot(a, b) instead will give more informative messages.
@Pierre-Bartet Pierre-Bartet marked this pull request as draft January 23, 2025 21:01
@Pierre-Bartet Pierre-Bartet force-pushed the add_select_k_best_test branch 2 times, most recently from 38d4e19 to 6bff77d Compare January 24, 2025 13:55
@Pierre-Bartet
Copy link
Contributor Author

@xadupre I fixed the issue (just for SelectKBest, not the other selectors), but before I go further I want to be sure I understand properly what was the initial intention of the code I changed.

All selectors such as SelectKBest use the concat shape calculator instead of something closer to the array_feature_extractor one. This look like a mistake to me, so I unregistered the selectors from the concat shape calculator, and use a slight variation of the array_feature_extractor one instead.

What bothers me is that in the end the calculate_sklearn_array_feature_extractor is never called during the test I added (and never was).

So my two questions are:

  1. Was the use of the concat shape calculator for the selectors just a mistake, or was there some deeper intention here?
  2. Is it intended that with or without my fix, calculate_sklearn_array_feature_extractor is not used when converting SelectKBest into an array_feature_extractor?

@xadupre
Copy link
Collaborator

xadupre commented Jan 24, 2025

You should remove the register and run tests/test_sklearn_feature_selection_converters.py to see which unit tests require it. shape_calculator is a kind of legacy but it is used to compute the output shape and type of a model without converting it so that the conversion of the next estimator can start before the consersion of the previous is done. In practice, this never happens.

@Pierre-Bartet
Copy link
Contributor Author

No tests/test_sklearn_feature_selection_converters.py test fail when doing that, but other tests fail elsewhere.
So from what I understand, the answer to question 2. is "Yes".
I guess this is also the case for question 1, so I'll just proceed and do the same for all selectors.

X = np.array(
[[1, 2, 3, 1], [0, 3, 1, 4], [3, 5, 6, 1], [1, 2, 1, 5]], dtype=np.int64
)
y = np.array([0, 1, 0, 1])
model.fit(X, y)

model_onnx = convert_sklearn(
model,

Check notice

Code scanning / CodeQL

Imprecise assert Note test

assertTrue(a is not b) cannot provide an informative message. Using assertIsNot(a, b) instead will give more informative messages.
Pierre Bartet added 5 commits January 24, 2025 16:02
Signed-off-by: Pierre Bartet <[email protected]>
Signed-off-by: Pierre Bartet <[email protected]>
Signed-off-by: Pierre Bartet <[email protected]>
@Pierre-Bartet Pierre-Bartet force-pushed the add_select_k_best_test branch from a4af89e to 1dad533 Compare January 24, 2025 15:02
@Pierre-Bartet Pierre-Bartet marked this pull request as ready for review January 24, 2025 15:26
@Pierre-Bartet
Copy link
Contributor Author

@xadupre done

@xadupre xadupre merged commit 22a6a56 into onnx:main Jan 24, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants