Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore cudf's __dataframe__ deprecation. #6229

Merged

Conversation

bdice
Copy link
Contributor

@bdice bdice commented Jan 16, 2025

Currently CI is failing due to rapidsai/cudf#17736.

The __dataframe__ protocol appears to be used internally by scikit-learn: https://github.com/scikit-learn/scikit-learn/blob/311bf6badd74bb69081eb90e2643f15706d3473c/sklearn/utils/validation.py#L389

Errors look like:

FAILED test_metrics.py::test_sklearn_search - FutureWarning: Using `__dataframe__` is deprecated

This PR ignores the FutureWarning to allow CI to pass.

@bdice bdice requested a review from a team as a code owner January 16, 2025 17:46
@bdice bdice requested review from cjnolet and csadorf January 16, 2025 17:46
@github-actions github-actions bot added the Cython / Python Cython or Python issue label Jan 16, 2025
@jakirkham jakirkham added bug Something isn't working non-breaking Non-breaking change labels Jan 16, 2025
@bdice bdice marked this pull request as draft January 16, 2025 20:44
Copy link

copy-pr-bot bot commented Jan 16, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@bdice
Copy link
Contributor Author

bdice commented Jan 16, 2025

/ok to test

@bdice
Copy link
Contributor Author

bdice commented Jan 16, 2025

Help on this PR is welcome! Please feel free to push if you can fix any of the remaining test failures.

@@ -218,6 +218,8 @@ def test_predict_large_n_classes(datatype):
assert array_equal(y_hat.astype(np.int32), y_test.astype(np.int32))


# Ignore FutureWarning: Using `__dataframe__` is deprecated
@pytest.mark.filterwarnings("ignore::FutureWarning")
Copy link
Member

@betatim betatim Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sure we only ignore the dataframe warning and not all FutureWarnings. Also remvose the need for the comment I'd say

Suggested change
@pytest.mark.filterwarnings("ignore::FutureWarning")
@pytest.mark.filterwarnings("ignore:Support for loading dataframes via the `__dataframe__` interchange protocol is deprecated")

(same for the other occurrence)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unable to reproduce getting a warning in this test (and don't see how one could be generated). I think this one can just be dropped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcrist What version of cudf are you using? Only recent 25.02 nightlies will show this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

25.02.00a273. I see the warning at the other location, but not here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot reproduce this one either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #6239 as a follow-up with this filter removed. My hope is that this PR passes CI and we can merge it as-is, then follow up with that PR.

@betatim
Copy link
Member

betatim commented Jan 17, 2025

Looking at the failed CI jobs I see a lot of:

ConftestImportFailure: FutureWarning: The `rmm._cuda.stream` module is deprecated in 25.02 and will be removed in a future release. Use `rmm.pylibrmm.stream` instead. (from /__w/cuml/cuml/python/cuml/cuml/tests/conftest.py)
For more information see https://pluggy.readthedocs.io/en/stable/api_reference.html#pluggy.PluggyTeardownRaisedWarning
  config = pluginmanager.hook.pytest_cmdline_parse(
ImportError while loading conftest '/__w/cuml/cuml/python/cuml/cuml/tests/conftest.py'.
conftest.py:17: in <module>
    from cuml.testing.utils import create_synthetic_dataset
/opt/conda/envs/test/lib/python3.12/site-packages/cuml/__init__.py:17: in <module>
    from cuml.internals.base import Base, UniversalBase
/opt/conda/envs/test/lib/python3.12/site-packages/cuml/internals/__init__.py:18: in <module>
    from cuml.internals.base_helpers import BaseMetaClass, _tags_class_and_instance
/opt/conda/envs/test/lib/python3.12/site-packages/cuml/internals/base_helpers.py:20: in <module>
    from cuml.internals.api_decorators import (
/opt/conda/envs/test/lib/python3.12/site-packages/cuml/internals/api_decorators.py:24: in <module>
    from cuml.internals import input_utils as iu
/opt/conda/envs/test/lib/python3.12/site-packages/cuml/internals/input_utils.py:20: in <module>
    from cuml.internals.array import CumlArray
/opt/conda/envs/test/lib/python3.12/site-packages/cuml/internals/array.py:21: in <module>
    from cuml.internals.global_settings import GlobalSettings
/opt/conda/envs/test/lib/python3.12/site-packages/cuml/internals/global_settings.py:20: in <module>
    from cuml.internals.device_type import DeviceType
/opt/conda/envs/test/lib/python3.12/site-packages/cuml/internals/device_type.py:19: in <module>
    from cuml.internals.mem_type import MemoryType
/opt/conda/envs/test/lib/python3.12/site-packages/cuml/internals/mem_type.py:22: in <module>
    cudf = gpu_only_import("cudf")
/opt/conda/envs/test/lib/python3.12/site-packages/cuml/internals/safe_imports.py:362: in gpu_only_import
    return importlib.import_module(module)
/opt/conda/envs/test/lib/python3.12/site-packages/cudf/__init__.py:19: in <module>
    _setup_numba()
/opt/conda/envs/test/lib/python3.12/site-packages/cudf/utils/_numba.py:124: in _setup_numba
    shim_ptx_cuda_version = _get_cuda_build_version()
/opt/conda/envs/test/lib/python3.12/site-packages/cudf/utils/_numba.py:19: in _get_cuda_build_version
    from cudf._lib import strings_udf
/opt/conda/envs/test/lib/python3.12/site-packages/cudf/_lib/__init__.py:2: in <module>
    from . import strings_udf
strings_udf.pyx:1: in init cudf._lib.strings_udf
    ???
/opt/conda/envs/test/lib/python3.12/site-packages/rmm/_cuda/stream.py:31: in <module>
    warnings.warn(
E   FutureWarning: The `rmm._cuda.stream` module is deprecated in 25.02 and will be removed in a future release. Use `rmm.pylibrmm.stream` instead.

Which makes me think that somewhere in strings_udf.pyx there is an old import. However looking at https://github.com/rapidsai/cudf/blob/a4bbd0930a0e4922f69586560b064a0bd9e6aedc/python/cudf/cudf/_lib/strings_udf.pyx I can't immediately see it and the last edit is a few days ago. Maybe compiling locally with more debugging turned on so we can see which line in strings_udf.pyx is causing this can shed light on this

@bdice
Copy link
Contributor Author

bdice commented Jan 17, 2025

rapidsai/rmm#1775 would cause this warning, but we searched the RAPIDS code base extensively to make sure there were no internal uses of this that would trigger deprecations... I am looking now to see what we might have missed.

@bdice
Copy link
Contributor Author

bdice commented Jan 17, 2025

I still don't see anything and can't reproduce locally. I am trying to rerun.

@Matt711
Copy link
Contributor

Matt711 commented Jan 17, 2025

I'm taking a look too.

@bdice
Copy link
Contributor Author

bdice commented Jan 17, 2025

Seems like rerunning CI has fixed the problem. I suspect there was some intermediate state where the RMM PR had been merged but not all artifacts / dependencies agreed on how it was supposed to be used until the dependency tree (notably cudf) was rebuilt? Not sure.

@@ -218,6 +218,8 @@ def test_predict_large_n_classes(datatype):
assert array_equal(y_hat.astype(np.int32), y_test.astype(np.int32))


# Ignore FutureWarning: Using `__dataframe__` is deprecated
@pytest.mark.filterwarnings("ignore::FutureWarning")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unable to reproduce getting a warning in this test (and don't see how one could be generated). I think this one can just be dropped.

@@ -163,6 +163,8 @@ def test_r2_score(datatype, use_handle):
np.testing.assert_almost_equal(score, 0.98, decimal=7)


# Ignore FutureWarning: Using `__dataframe__` is deprecated
@pytest.mark.filterwarnings("ignore::FutureWarning")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT this test checks that GridSearchCV works with cudf, which is no longer true after __dataframe__ was deprecated. IMO we should delete the test (or ask the cudf team to reconsider). Filtering the warning only puts things off until __dataframe__ is removed when it'll just break again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you able to reproduce this warning locally? If so, can you please try commenting out the __dataframe__ implementation in dataframe.py in cudf and try again? Is cuml using __dataframe__ explicitly or is it an implicit path such that a different path would be taken if this implementation doesn't exist?

Copy link
Member

@jcrist jcrist Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point! From reading the sklearn code I think sklearn has a path that would be taken once the __dataframe__ path is fully removed, so maybe filtering out the warning for now is fine. I'll need to get a local cudf build setup to try, will do later today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized dataframe.py isn't in cython, so this was an easy quick check. I can confirm that once the __dataframe__ code is removed from cudf then things work again (though I can't say how efficiently). Using filterwarnings here seems fine (though with the recommendation to a more specific filter that Tim made above).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old __dataframe__ code path wasn't actually "efficient" in any meaningful way. There is no device data transport happening, only metadata. That's actually the core problem with this protocol: it doesn't specify who is responsible for actually transferring data across device boundaries, leading to consumers having to make per-library distinctions. That's discussed a bit more in rapidsai/cudf#17403.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for testing so quickly, I was just spinning up my own dev environment to be able to verify this claim myself!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer a more specific filter as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #6239 as a follow-up with the proposal above to make the filter more specific. My hope is that this PR passes CI and we can merge it as-is, then follow up with that PR.

@dantegd dantegd marked this pull request as ready for review January 21, 2025 19:19
@bdice
Copy link
Contributor Author

bdice commented Jan 21, 2025

Thanks for all the reviews. If CI passes, I think we should merge this as-is so that CI is unblocked.

I am happy to file a follow-up PR I filed #6239 to make the warning filter more specific and attempt to remove the one case where it may not be necessary.

@bdice bdice added bug Something isn't working and removed bug Something isn't working labels Jan 21, 2025
@bdice bdice self-assigned this Jan 21, 2025
Copy link
Contributor

@csadorf csadorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with follow-ups pushed to #6239 .

@bdice
Copy link
Contributor Author

bdice commented Jan 21, 2025

CUDA 11.8 ARM wheel tests are failing with a message that is similar to some test failures we've seen popping up in cuVS.

FAILED test_dask_serialization.py::test_serialize_before_training - RuntimeError: 1 of 1 worker jobs failed: cuBLAS error encountered at: file=/tmp/pip-build-env-yckqnvi0/normal/lib/python3.12/site-packages/libraft/include/raft/linalg/detail/cublaslt_wrappers.hpp line=261: call='cublasLtMatmul(resource::get_cublaslt_handle(res), mm_desc->desc, alpha, a_ptr, mm_desc->a, b_ptr, mm_desc->b, beta, c_ptr, mm_desc->c, c_ptr, mm_desc->c, &(mm_desc->heuristics.algo), nullptr, 0, stream)', Reason=13:CUBLAS_STATUS_EXECUTION_FAILED

I do not know the root cause for this. Perhaps we can request an admin-merge on this PR or #6239 and handle the CUDA 11.8 ARM wheel tests separately.

@raydouglass raydouglass merged commit 01e19bb into rapidsai:branch-25.02 Jan 21, 2025
61 of 63 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Cython / Python Cython or Python issue non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants