Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Comparing .loc/.iloc to tuples and chained indexing #60632

Open
1 task done
johnasiano opened this issue Dec 31, 2024 · 3 comments
Open
1 task done

DOC: Comparing .loc/.iloc to tuples and chained indexing #60632

johnasiano opened this issue Dec 31, 2024 · 3 comments
Labels
Docs Indexing Related to indexing on series/frames, not to indexes themselves Needs Discussion Requires discussion from core team before further action

Comments

@johnasiano
Copy link

johnasiano commented Dec 31, 2024

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-view-versus-copy

Documentation problem

import pandas as pd

# Creating a DataFrame with some sample data
data = {
    'Name': ['Jason', 'Emma', 'Alex', 'Sarah'],
    'Age': [28, 24, 32, 27],
    'City': ['New York', 'London', 'Paris', 'Tokyo'],
    'Salary': [75000, 65000, 85000, 70000]
}

df = pd.DataFrame(data)

# Display the DataFrame
print(df)

I want to update Jason’s age, and I do so with 

df['Age'][df['Name'] == 'Jason'] = 29


For code such as the code shown above, the df may or may not be update Jason's age to 29 due to the chained indexing that is being used.

The documentation mentions how .iloc/.loc is a better option. For example, something such as the following.

df.loc[df['Name'] == 'Jason', 'Age'] = 29

However it is not clear about best practices regarding tuples, such as the following.

df[('Age', df['Name'] == 'Jason')] = 29

Suggested fix for documentation

The suggested fix is to explain how the use of tuples would compare to the use of .iloc/.loc and the use of chained indexing in the context of best practices in pandas. Considerations can include time complexity, space complexity, code readability, etc.

@johnasiano johnasiano added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 31, 2024
@rhshadrach
Copy link
Member

rhshadrach commented Dec 31, 2024

Thanks for the report!

The documentation mentions how .iloc/.loc is a better option. For example, something such as the following.

df.loc[df['Name'] == 'Jason', 'Age'] = 29

However it is not clear about best practices regarding tuples, such as the following.

df[('Age', df['Name'] == 'Jason')] = 29

The following two lines are equivalent:

df[('Age', df['Name'] == 'Jason')] = 29 
df['Age', df['Name'] == 'Jason'] = 29

That is, the argument in the 2nd line above being passed to __getitem__ is implicitly a tuple.

In general, I do not think pandas documentation should be documenting Python semantics, but perhaps this is a case where that general rule can be ignored.

@rhshadrach rhshadrach added Indexing Related to indexing on series/frames, not to indexes themselves Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 31, 2024
@johnasiano
Copy link
Author

Gotcha, so anything with a comma is a tuple?

Even though my two examples both involve tuples, the example that included loc would still be preferred because it is faster than the other example that didn't use loc?

@rhshadrach
Copy link
Member

I think the example without loc:

 df[('Age', df['Name'] == 'Jason')] = 29 

just raises an error, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Indexing Related to indexing on series/frames, not to indexes themselves Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

2 participants