Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize nth and nth_back for BoundListIterator #4810

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

Owen-CH-Leung
Copy link
Contributor

@Owen-CH-Leung Owen-CH-Leung commented Dec 21, 2024

See #4787

This PR optimizes nth and nth_back for BoundListIterator, and added unittest & benchmarks for these 2 APIs. Here are the benchmark of the optimized nth and nth_back:

On Stable toolchain

With Optimization

list_nth                time:   [351.21 ns 351.71 ns 352.28 ns]
list_nth_back           time:   [355.34 ns 356.09 ns 356.95 ns]

The default nth and nth_back implementation

list_nth                time:   [2.8422 µs 2.8502 µs 2.8592 µs]
list_nth_back           time:   [3.6509 µs 3.6557 µs 3.6611 µs]

On nightly toolchain

With Optimization

list_nth                time:   [293.11 ns 293.59 ns 294.15 ns]
list_nth_back           time:   [301.09 ns 301.51 ns 302.00 ns]

The default nth and nth_back implementation

list_nth                time:   [2.8488 µs 2.8525 µs 2.8568 µs]
list_nth_back           time:   [3.6524 µs 3.6600 µs 3.6683 µs]

@davidhewitt
Copy link
Member

Awesome, thanks! Will seek to review soon. I guess we can do the same thing for tuple iteration?

@Owen-CH-Leung Owen-CH-Leung marked this pull request as ready for review December 21, 2024 15:47
@Owen-CH-Leung
Copy link
Contributor Author

Owen-CH-Leung commented Dec 21, 2024

Awesome, thanks! Will seek to review soon. I guess we can do the same thing for tuple iteration?

Yup - I'll file a PR to optimise tuple iterator after the holiday =)

let length = self.length.min(self.list.len());
let target_index = self.index + n;
if self.index + n < length {
let item = unsafe { self.get_item(target_index) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, is there a time-of-check to time-of-use bug here on the length? Not one for this PR, but a follow up I will try not to forget...

Copy link
Contributor Author

@Owen-CH-Leung Owen-CH-Leung Dec 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup I think that's a potential TOCTOU bug here. This particular implementation assumes the user ensures proper synchronization if they intend to use the iterator in a multi-threaded or mutable environment.

Copy link
Contributor Author

@Owen-CH-Leung Owen-CH-Leung Dec 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current implementation of next also has a potential TOCTOU bug:

https://github.com/PyO3/pyo3/blob/main/src/types/list.rs#L495-L498

@ngoldbaum
Copy link
Contributor

Thanks for working on this! Hopefully my PR should be merged in the next few days and then you can rebase this.

@ngoldbaum
Copy link
Contributor

It might also make sense to implement advance_by on nightly. nth is implemented in terms of advance_by, so this improvement would fall out "for free" on nightly if you implement that function.

Copy link
Contributor

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline comments, I think there are some issues in how this is set up for the limited API. I also think we should define nth and nth_back for all non-nightly builds. I also don't think it's necessary to call get_item in advance_by, per my reading of the advance_by docs.

src/types/list.rs Outdated Show resolved Hide resolved
src/types/list.rs Outdated Show resolved Hide resolved
src/types/list.rs Outdated Show resolved Hide resolved
src/types/list.rs Outdated Show resolved Hide resolved
src/types/list.rs Outdated Show resolved Hide resolved
pyo3-benches/benches/bench_list.rs Show resolved Hide resolved
noxfile.py Outdated Show resolved Hide resolved
@Owen-CH-Leung Owen-CH-Leung marked this pull request as draft January 10, 2025 08:22
@Owen-CH-Leung Owen-CH-Leung marked this pull request as ready for review January 10, 2025 13:35
@Owen-CH-Leung
Copy link
Contributor Author

Owen-CH-Leung commented Jan 10, 2025

@ngoldbaum Can I seek your review again ? I've adopted your suggested changes above

the CI clippy/x86_64-unknown-linux-gnu/beta is failing as main btw

@ngoldbaum
Copy link
Contributor

the CI clippy/x86_64-unknown-linux-gnu/beta is failing as main btw

GitHub's interface doesn't help by putting a big red x next to it, but this is fine and we can merge PRs as long as the "conclusion" job finishes. Right now it's broken because of an upstream bug in clippy, so it will probably stay that way until the beta channel gets the fix.

I'll try to give this PR another once-over today but failing that will get to it next week.

@@ -750,6 +849,32 @@ impl<'py> Iterator for BoundListIterator<'py> {
None
})
}

#[inline]
#[cfg(all(not(Py_LIMITED_API), feature = "nightly"))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[cfg(all(not(Py_LIMITED_API), feature = "nightly"))]
#[cfg(feature = "nightly")]

Unless I'm missing something...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we need the flag not(Py_LIMITED_API. Otherwise we will compile the code without access to with_critical_section

https://github.com/PyO3/pyo3/actions/runs/12769998628/job/35593851381?pr=4810

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can delete the #[cfg(not(Py_LIMITED_API))] on BoundListIterator::with_critical_section, it was only there to avoid a clippy lint about unused code. pyo3::sync::with_critical_section is unconditionally exposed in the PyO3 API, it's just a no-op on GIL-enabled builds. We wanted to allow people to write code that avoids conditional compilation, where possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I deleted #[cfg(not(Py_LIMITED_API))] but was hitting the unused code lint warning, so I put in a allow lint dead_code. Lemme know your thoughts.

I also pushed a few commits and implemented advance_back_by.

@@ -547,6 +547,46 @@ impl<'py> BoundListIterator<'py> {
}
}

#[inline]
#[cfg(all(not(Py_LIMITED_API), feature = "nightly"))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused - shouldn't this only be defined on not(feature = "nightly")? Because on nightly this is defined in terms of advance_by in the standard library implementation, so we don't need to override nth. It'll just do the right thing on nightly by calling advance_by and then next(). On not nightly you need this override because there's no way to override advance_by.

Ditto for all the other BoundListIterator methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed - I spent hours reading the stdlib again to figure out that on stable toolchain, we should override nth, and override advance_by on nightly. Thanks for spotting this. I've made the changes to compile it on stable


#[inline]
#[cfg(all(Py_LIMITED_API, feature = "nightly"))]
#[deny(unsafe_op_in_unsafe_fn)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[deny(unsafe_op_in_unsafe_fn)]

there's no unsafe code in this function so this is unnecessary

) -> Option<Bound<'py, PyAny>> {
let length = length.0.min(list.len());
let target_index = index.0 + n;
if index.0 + n < length {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if index.0 + n < length {
if target_index < length {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks & revised

) -> Option<Bound<'py, PyAny>> {
let length = length.0.min(list.len());
let target_index = index.0 + n;
if index.0 + n < length {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if index.0 + n < length {
if target_index < length {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks & revised

src/types/list.rs Show resolved Hide resolved
@@ -625,6 +704,26 @@ impl<'py> Iterator for BoundListIterator<'py> {
}
}

#[inline]
#[cfg(feature = "nightly")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[cfg(feature = "nightly")]
#[cfg(not(feature = "nightly"))]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks & revised

Comment on lines 710 to 724
#[cfg(not(Py_LIMITED_API))]
{
self.with_critical_section(|index, length, list| unsafe {
Self::nth_unchecked(index, length, list, n)
})
}
#[cfg(Py_LIMITED_API)]
{
let Self {
index,
length,
list,
} = self;
Self::nth(index, length, list, n)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[cfg(not(Py_LIMITED_API))]
{
self.with_critical_section(|index, length, list| unsafe {
Self::nth_unchecked(index, length, list, n)
})
}
#[cfg(Py_LIMITED_API)]
{
let Self {
index,
length,
list,
} = self;
Self::nth(index, length, list, n)
}
self.with_critical_section(|index, length, list| unsafe {
Self::nth(index, length, list, n)
})

If you implement my suggestion to only have BoundListIterator::nth then you can simplify this a lot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and you can do a similar refactor for the nth_back implementation for the impl DoubleEndedIterator for BoundListIterator block below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I've adopted this idea for nth and nth_back

) -> Option<Bound<'py, PyAny>> {
let length_size = length.0.min(list.len());
if index.0 + n < length_size {
let target_index = length_size - n - 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally this logic feels a little backwards to me, and I would probably prefer to do this using signed integers and testing if the index is less than zero. That said, this is totally equivalent and if it makes sense to you as-is then no need to change it.

@Owen-CH-Leung
Copy link
Contributor Author

Owen-CH-Leung commented Jan 14, 2025

Just realize that I forgot to implement advance_back_by. I'll implement it in coming days

Copy link
Contributor

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, much simpler now!

I'm not going to merge immediately because I'd like someone with a little bit more experience maintaining PyO3 to give this a once-over.

@@ -0,0 +1 @@
Optimizes `nth` and `nth_back` for `BoundListIterator`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also mention that advance_by and advance_by_back are implemented on nightly

Comment on lines +562 to +565
#[cfg(Py_LIMITED_API)]
{
list.get_item(target_index).expect("get-item failed")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this case can never occur, because if Py_LIMITED_API is set, the whole function will be cfg out. Same for nth_back below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think that's wrong though. Maybe there's a cfg block preventing this code from being used for nth on the limited API. We definitely still want to override nth on the limited API to do the constant time algorithm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I think the cfg on the function needs adjustment. It currently says "unlimited api and not nightly" but it should probably say just "not nightly" then. (But I just skimmed though the code quickly)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed putting just not nightly on function signature is more logical. I've made the change. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants