Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX-#6822: Do not propagate NotImplementedError to a user on a 'set_columns()' with dupl labels #6823

Merged
merged 2 commits into from
Dec 13, 2023

Conversation

dchigarev
Copy link
Collaborator

@dchigarev dchigarev commented Dec 13, 2023

What do these changes do?

DtypesDescriptor describes partially known dtypes and doesn't support duplicated column labels. Everywhere in the project, we wrap constructions of new descriptors with try: ... catch NotImplementedError: in order to avoid propagating the error to a user in cases where dupl labels occur. This PR fixes a missed case when explicitly setting dupl labels via ._set_columns().

  • first commit message and PR title follow format outlined here

    NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.

  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves BUG: NotImplementedError when trying to set duplicated column names #6822
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date

…r on a 'set_columns()' with dupl labels

Signed-off-by: Dmitry Chigarev <[email protected]>
@dchigarev dchigarev changed the title FIX-#6822: Do not propagate NotImplementedError to a user on a 'set_c… FIX-#6822: Do not propagate NotImplementedError to a user on a 'set_columns()' with dupl labels Dec 13, 2023
Signed-off-by: Dmitry Chigarev <[email protected]>
@dchigarev dchigarev marked this pull request as ready for review December 13, 2023 12:43
@@ -294,6 +294,11 @@ def set_index(
Calling this method on a descriptor that returns ``None`` for ``.columns_order``
will result into information lose.
"""
if len(new_index) != len(set(new_index)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to use method nunique?

Copy link
Collaborator Author

@dchigarev dchigarev Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.unique struggles to handle MultiIndex objects, that's the reason we use set() everywhere in this class

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_index.unique() might also not always work as sometimes it can be a python list here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_index.unique() might also not always work as sometimes it can be a python list here

Should we update the docstring for that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, docstring seems to be incorrect, will update in some future PR

@YarShev YarShev merged commit acfcf34 into modin-project:master Dec 13, 2023
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: NotImplementedError when trying to set duplicated column names
3 participants