-
Notifications
You must be signed in to change notification settings - Fork 655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX-#6822: Do not propagate NotImplementedError to a user on a 'set_columns()' with dupl labels #6823
Conversation
…r on a 'set_columns()' with dupl labels Signed-off-by: Dmitry Chigarev <[email protected]>
Signed-off-by: Dmitry Chigarev <[email protected]>
@@ -294,6 +294,11 @@ def set_index( | |||
Calling this method on a descriptor that returns ``None`` for ``.columns_order`` | |||
will result into information lose. | |||
""" | |||
if len(new_index) != len(set(new_index)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to use method nunique
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.unique
struggles to handle MultiIndex objects, that's the reason we use set()
everywhere in this class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new_index.unique()
might also not always work as sometimes it can be a python list here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new_index.unique()
might also not always work as sometimes it can be a python list here
Should we update the docstring for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, docstring seems to be incorrect, will update in some future PR
What do these changes do?
DtypesDescriptor
describes partially known dtypes and doesn't support duplicated column labels. Everywhere in the project, we wrap constructions of new descriptors withtry: ... catch NotImplementedError:
in order to avoid propagating the error to a user in cases where dupl labels occur. This PR fixes a missed case when explicitly setting dupl labels via._set_columns()
.flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
NotImplementedError
when trying to set duplicated column names #6822docs/development/architecture.rst
is up-to-date