-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore cudf's __dataframe__ deprecation. #6229
Ignore cudf's __dataframe__ deprecation. #6229
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
/ok to test |
Help on this PR is welcome! Please feel free to push if you can fix any of the remaining test failures. |
@@ -218,6 +218,8 @@ def test_predict_large_n_classes(datatype): | |||
assert array_equal(y_hat.astype(np.int32), y_test.astype(np.int32)) | |||
|
|||
|
|||
# Ignore FutureWarning: Using `__dataframe__` is deprecated | |||
@pytest.mark.filterwarnings("ignore::FutureWarning") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sure we only ignore the dataframe warning and not all FutureWarnings
. Also remvose the need for the comment I'd say
@pytest.mark.filterwarnings("ignore::FutureWarning") | |
@pytest.mark.filterwarnings("ignore:Support for loading dataframes via the `__dataframe__` interchange protocol is deprecated") |
(same for the other occurrence)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unable to reproduce getting a warning in this test (and don't see how one could be generated). I think this one can just be dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jcrist What version of cudf are you using? Only recent 25.02
nightlies will show this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
25.02.00a273
. I see the warning at the other location, but not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot reproduce this one either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I filed #6239 as a follow-up with this filter removed. My hope is that this PR passes CI and we can merge it as-is, then follow up with that PR.
Looking at the failed CI jobs I see a lot of:
Which makes me think that somewhere in |
rapidsai/rmm#1775 would cause this warning, but we searched the RAPIDS code base extensively to make sure there were no internal uses of this that would trigger deprecations... I am looking now to see what we might have missed. |
I still don't see anything and can't reproduce locally. I am trying to rerun. |
I'm taking a look too. |
Seems like rerunning CI has fixed the problem. I suspect there was some intermediate state where the RMM PR had been merged but not all artifacts / dependencies agreed on how it was supposed to be used until the dependency tree (notably cudf) was rebuilt? Not sure. |
@@ -218,6 +218,8 @@ def test_predict_large_n_classes(datatype): | |||
assert array_equal(y_hat.astype(np.int32), y_test.astype(np.int32)) | |||
|
|||
|
|||
# Ignore FutureWarning: Using `__dataframe__` is deprecated | |||
@pytest.mark.filterwarnings("ignore::FutureWarning") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unable to reproduce getting a warning in this test (and don't see how one could be generated). I think this one can just be dropped.
@@ -163,6 +163,8 @@ def test_r2_score(datatype, use_handle): | |||
np.testing.assert_almost_equal(score, 0.98, decimal=7) | |||
|
|||
|
|||
# Ignore FutureWarning: Using `__dataframe__` is deprecated | |||
@pytest.mark.filterwarnings("ignore::FutureWarning") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT this test checks that GridSearchCV
works with cudf, which is no longer true after __dataframe__
was deprecated. IMO we should delete the test (or ask the cudf team to reconsider). Filtering the warning only puts things off until __dataframe__
is removed when it'll just break again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you able to reproduce this warning locally? If so, can you please try commenting out the __dataframe__
implementation in dataframe.py in cudf and try again? Is cuml using __dataframe__
explicitly or is it an implicit path such that a different path would be taken if this implementation doesn't exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point! From reading the sklearn code I think sklearn has a path that would be taken once the __dataframe__
path is fully removed, so maybe filtering out the warning for now is fine. I'll need to get a local cudf build setup to try, will do later today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized dataframe.py
isn't in cython, so this was an easy quick check. I can confirm that once the __dataframe__
code is removed from cudf then things work again (though I can't say how efficiently). Using filterwarnings
here seems fine (though with the recommendation to a more specific filter that Tim made above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old __dataframe__
code path wasn't actually "efficient" in any meaningful way. There is no device data transport happening, only metadata. That's actually the core problem with this protocol: it doesn't specify who is responsible for actually transferring data across device boundaries, leading to consumers having to make per-library distinctions. That's discussed a bit more in rapidsai/cudf#17403.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for testing so quickly, I was just spinning up my own dev environment to be able to verify this claim myself!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would prefer a more specific filter as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I filed #6239 as a follow-up with the proposal above to make the filter more specific. My hope is that this PR passes CI and we can merge it as-is, then follow up with that PR.
Thanks for all the reviews. If CI passes, I think we should merge this as-is so that CI is unblocked.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with follow-ups pushed to #6239 .
CUDA 11.8 ARM wheel tests are failing with a message that is similar to some test failures we've seen popping up in cuVS.
I do not know the root cause for this. Perhaps we can request an admin-merge on this PR or #6239 and handle the CUDA 11.8 ARM wheel tests separately. |
Currently CI is failing due to rapidsai/cudf#17736.
The
__dataframe__
protocol appears to be used internally by scikit-learn: https://github.com/scikit-learn/scikit-learn/blob/311bf6badd74bb69081eb90e2643f15706d3473c/sklearn/utils/validation.py#L389Errors look like:
This PR ignores the
FutureWarning
to allow CI to pass.