Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore cudf's __dataframe__ deprecation. #6229

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 10 additions & 6 deletions python/cuml/cuml/tests/test_input_utils.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# Copyright (c) 2019-2024, NVIDIA CORPORATION.
# Copyright (c) 2019-2025, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -35,6 +35,7 @@
np = cpu_only_import("numpy")

nbcuda = gpu_only_import_from("numba", "cuda")
cudf_pandas_active = gpu_only_import_from("cudf.pandas", "LOADED")
pdDF = cpu_only_import_from("pandas", "DataFrame")


Expand Down Expand Up @@ -446,11 +447,14 @@ def test_tocupy_missing_values_handling():
assert str(array.dtype) == "float64"
assert cp.isnan(array[1])

with pytest.raises(ValueError):
df = cudf.Series(data=[7, None, 3])
array, n_rows, n_cols, dtype = input_to_cupy_array(
df, fail_on_null=True
)
# cudf.pandas now mimics pandas better for handling None, so we don't
# need to fail and raise this error when cudf.pandas is active.
if not cudf_pandas_active:
with pytest.raises(ValueError):
df = cudf.Series(data=[7, None, 3])
array, n_rows, n_cols, dtype = input_to_cupy_array(
df, fail_on_null=True
)


@pytest.mark.cudf_pandas
Expand Down
4 changes: 3 additions & 1 deletion python/cuml/cuml/tests/test_kneighbors_classifier.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2019-2023, NVIDIA CORPORATION.
# Copyright (c) 2019-2025, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -218,6 +218,8 @@ def test_predict_large_n_classes(datatype):
assert array_equal(y_hat.astype(np.int32), y_test.astype(np.int32))


# Ignore FutureWarning: Using `__dataframe__` is deprecated
@pytest.mark.filterwarnings("ignore::FutureWarning")
@pytest.mark.parametrize("n_samples", [100])
Copy link
Member

@betatim betatim Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sure we only ignore the dataframe warning and not all FutureWarnings. Also remvose the need for the comment I'd say

Suggested change
@pytest.mark.filterwarnings("ignore::FutureWarning")
@pytest.mark.filterwarnings("ignore:Support for loading dataframes via the `__dataframe__` interchange protocol is deprecated")

(same for the other occurrence)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unable to reproduce getting a warning in this test (and don't see how one could be generated). I think this one can just be dropped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcrist What version of cudf are you using? Only recent 25.02 nightlies will show this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

25.02.00a273. I see the warning at the other location, but not here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot reproduce this one either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #6239 as a follow-up with this filter removed. My hope is that this PR passes CI and we can merge it as-is, then follow up with that PR.

@pytest.mark.parametrize("n_features", [40])
@pytest.mark.parametrize("n_neighbors", [4])
Expand Down
2 changes: 2 additions & 0 deletions python/cuml/cuml/tests/test_metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,8 @@ def test_r2_score(datatype, use_handle):
np.testing.assert_almost_equal(score, 0.98, decimal=7)


# Ignore FutureWarning: Using `__dataframe__` is deprecated
@pytest.mark.filterwarnings("ignore::FutureWarning")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT this test checks that GridSearchCV works with cudf, which is no longer true after __dataframe__ was deprecated. IMO we should delete the test (or ask the cudf team to reconsider). Filtering the warning only puts things off until __dataframe__ is removed when it'll just break again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you able to reproduce this warning locally? If so, can you please try commenting out the __dataframe__ implementation in dataframe.py in cudf and try again? Is cuml using __dataframe__ explicitly or is it an implicit path such that a different path would be taken if this implementation doesn't exist?

Copy link
Member

@jcrist jcrist Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point! From reading the sklearn code I think sklearn has a path that would be taken once the __dataframe__ path is fully removed, so maybe filtering out the warning for now is fine. I'll need to get a local cudf build setup to try, will do later today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized dataframe.py isn't in cython, so this was an easy quick check. I can confirm that once the __dataframe__ code is removed from cudf then things work again (though I can't say how efficiently). Using filterwarnings here seems fine (though with the recommendation to a more specific filter that Tim made above).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old __dataframe__ code path wasn't actually "efficient" in any meaningful way. There is no device data transport happening, only metadata. That's actually the core problem with this protocol: it doesn't specify who is responsible for actually transferring data across device boundaries, leading to consumers having to make per-library distinctions. That's discussed a bit more in rapidsai/cudf#17403.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for testing so quickly, I was just spinning up my own dev environment to be able to verify this claim myself!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer a more specific filter as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #6239 as a follow-up with the proposal above to make the filter more specific. My hope is that this PR passes CI and we can merge it as-is, then follow up with that PR.

def test_sklearn_search():
"""Test ensures scoring function works with sklearn machinery"""
import numpy as np
Expand Down
Loading