gh-126703: Add freelists for iterators and range, method and builtin_function_or_method objects #128368

eendebakpt · 2024-12-30T22:33:49Z

In this PR we add freelists for the top most allocated objects (measured using pyperformance benchmark). Some often allocated objects that have not yet been added: ints with 2 or 3 digits, exceptions (StopIteration, IndexError) and generators.

If the freelists increase performance, the PR should probably be split into multiple ones.

Microbenchmarks:

bench_list: Mean +- std dev: [main] 14.9 us +- 0.2 us -> [pr] 14.5 us +- 0.3 us: 1.03x faster
bench_int: Mean +- std dev: [main] 383 us +- 5 us -> [pr] 386 us +- 7 us: 1.01x slower
bench_float: Mean +- std dev: [main] 113 us +- 2 us -> [pr] 111 us +- 3 us: 1.02x faster
bench_builtin_or_method: Mean +- std dev: [main] 6.14 us +- 0.45 us -> [pr] 4.17 us +- 0.07 us: 1.47x faster
bench_list_iter: Mean +- std dev: [main] 141 ns +- 4 ns -> [pr] 126 ns +- 4 ns: 1.12x faster
bench_tuple_iter: Mean +- std dev: [main] 140 ns +- 7 ns -> [pr] 125 ns +- 2 ns: 1.12x faster
bench_range_iter: Mean +- std dev: [main] 144 ns +- 5 ns -> [pr] 138 ns +- 3 ns: 1.04x faster
bench_property: Mean +- std dev: [main] 2.06 us +- 0.04 us -> [pr] 2.03 us +- 0.02 us: 1.01x faster
bench_class_method: Mean +- std dev: [main] 2.30 us +- 0.02 us -> [pr] 2.32 us +- 0.05 us: 1.01x slower

Geometric mean: 1.08x faster

The list, float and int freelists are already in main, so we don't expect an improvement there. The iterator benchmarks show a modest improvement. bench_builtin_or_method shows an improvement, but is a a bit artificial benchmark.

Benchmark script

# Quick benchmark for cpython freelists

import pyperf


def bench_list(loops):
    range_it = iter(range(loops))
    tpl = tuple(range(100))

    t0 = pyperf.perf_counter()
    for ii in range_it:
        for ii in tpl:
            _ = [ii]
            _ = [ii, ii + 1]
            _ = [ii, ii + 1, ii]
    return pyperf.perf_counter() - t0


def collatz(a):
    while a > 1:
        if a % 2 == 0:
            a = a // 2
        else:
            a = 3 * a + 1


def bench_int(loops):
    range_it = range(loops)
    tpl = tuple(range(200, 300))

    t0 = pyperf.perf_counter()
    for ii in range_it:
        for jj in tpl:
            collatz(jj)
    return pyperf.perf_counter() - t0


def bench_float(loops):
    range_it = iter(range(loops))
    tpl = tuple(range(500))

    t0 = pyperf.perf_counter()
    for ii in range_it:
        x = 0
        for ii in tpl:
            x += float(ii + 1) ** 2 - float(ii + 1) ** 2
    return pyperf.perf_counter() - t0


def bench_builtin_or_method(loops):
    range_it = iter(range(loops))
    tpl = tuple(range(50))

    lst = []
    it = iter(set([2, 3, 4]))
    t0 = pyperf.perf_counter()
    for ii in range_it:
        for ii in tpl:
            lst.append
            it.__length_hint__
    return pyperf.perf_counter() - t0


class A:
    def __init__(self, value):
        self.value = value

    def x(self):
        return self.value

    @property
    def v(self):
        return self.value


def bench_property(loops):
    range_it = iter(range(loops))
    tpl = tuple(range(50))

    t0 = pyperf.perf_counter()
    for ii in range_it:
        a = A(ii)
        for ii in tpl:
            _ = a.v
    return pyperf.perf_counter() - t0


def bench_class_method(loops):
    range_it = iter(range(loops))
    tpl = tuple(range(50))

    t0 = pyperf.perf_counter()
    for ii in range_it:
        a = A(ii)
        for ii in tpl:
            _ = a.x()
    return pyperf.perf_counter() - t0


def bench_list_iter(loops):
    range_it = iter(range(loops))

    lst = list(range(5))
    t0 = pyperf.perf_counter()
    for ii in range_it:
        x = 0
        for ii in lst:
            x += ii
    return pyperf.perf_counter() - t0


def bench_tuple_iter(loops):
    range_it = iter(range(loops))

    tpl = tuple(range(5))
    t0 = pyperf.perf_counter()
    for ii in range_it:
        x = 0
        for ii in tpl:
            x += ii
    return pyperf.perf_counter() - t0


def bench_range_iter(loops):
    range_it = iter(range(loops))

    r = range(5)
    t0 = pyperf.perf_counter()
    for ii in range_it:
        x = 0
        for ii in r:
            x += ii
    return pyperf.perf_counter() - t0


# %timeit bench_list(1000)

if __name__ == "__main__":
    runner = pyperf.Runner()
    runner.bench_time_func("bench_list", bench_list)
    runner.bench_time_func("bench_int", bench_int)
    runner.bench_time_func("bench_float", bench_float)
    runner.bench_time_func("bench_builtin_or_method", bench_builtin_or_method)
    runner.bench_time_func("bench_list_iter", bench_list_iter)
    runner.bench_time_func("bench_tuple_iter", bench_tuple_iter)
    runner.bench_time_func("bench_range_iter", bench_range_iter)
    runner.bench_time_func("bench_property", bench_property)
    runner.bench_time_func("bench_class_method", bench_class_method)

Issue: Use freelist for range and iter objects #126703

Fidget-Spinner · 2025-01-01T22:17:08Z

I don't think we should share the freelists for iterators. We're not using that much memory and it's really bug-prone to share them.

eendebakpt · 2025-01-01T22:23:09Z

I don't think we should share the freelists for iterators. We're not using that much memory and it's really bug-prone to share them.

I agree with you. I am experimenting a bit to see whether it is possible at all to do this this with different types (maybe for PyType_GenericAlloc, or some size based freelist), but for the iterators I will probably split it again.

ericsnowcurrently · 2025-01-03T00:28:52Z

benchmarking results: https://github.com/faster-cpython/benchmarking-public/tree/main/results/bm-20250102-3.14.0a3%2B-05f479c

Fidget-Spinner · 2025-01-03T03:27:03Z

The results are excellent! 1% faster geomean. Great work and congrats Pieter!

corona10 · 2025-01-03T17:01:23Z

I am not sure that it's worth adding the free list every time if there is a small margin (<3-5%).
Maybe we should trade-off between complexity and maintainability. (Not a strong disagree FYI)

corona10 · 2025-01-03T17:04:41Z

pycfunctionobject / pycmethodobject / class_method / shared_iters are maybe good to be added.
But not sure about ranges / range_iters..

Fidget-Spinner · 2025-01-03T18:58:23Z

I am not sure that it's worth adding the free list every time if there is a small margin (<3-5%).

Maybe we should trade-off between complexity and maintainability. (Not a strong disagree FYI)

Benchmark results show consistent 1% geomean speedup on pyperformance. That's pretty worth it (for comparison, the entire types optimizer in the JIT is only 1% speedup and is way more code). Though you're probably right that not all of them are worth it. I'm thinking the method and list/tuple iters are most worth it.

eendebakpt added 5 commits December 29, 2024 22:48

Use freelist for range object

cb83490

Use freelist for rangeiter object

6adf31c

cleanup debug code

94cbc7c

set types

17100c0

add more freelists

ea98fa1

eendebakpt requested a review from ericsnowcurrently as a code owner December 30, 2024 22:33

bedevere-app bot mentioned this pull request Dec 30, 2024

Use freelist for range and iter objects #126703

Open

bedevere-app bot added the awaiting review label Dec 30, 2024

eendebakpt marked this pull request as draft December 30, 2024 22:34

bedevere-app bot removed the awaiting review label Dec 30, 2024

eendebakpt added 5 commits January 1, 2025 20:16

clear freelists

64be6c4

freelist for range iterators

2551f29

enable all freelists again

fce4651

add PyCFunctionObject

5d13675

Merge branch 'main' into iter_freelists

e296fbb

Merge branch 'main' into iter_freelists

05f479c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-126703: Add freelists for iterators and range, method and builtin_function_or_method objects #128368

gh-126703: Add freelists for iterators and range, method and builtin_function_or_method objects #128368

eendebakpt commented Dec 30, 2024 •

edited

Loading

Fidget-Spinner commented Jan 1, 2025

eendebakpt commented Jan 1, 2025

ericsnowcurrently commented Jan 3, 2025

Fidget-Spinner commented Jan 3, 2025

corona10 commented Jan 3, 2025

corona10 commented Jan 3, 2025

Fidget-Spinner commented Jan 3, 2025

gh-126703: Add freelists for iterators and range, method and builtin_function_or_method objects #128368

Are you sure you want to change the base?

gh-126703: Add freelists for iterators and range, method and builtin_function_or_method objects #128368

Conversation

eendebakpt commented Dec 30, 2024 • edited Loading

Fidget-Spinner commented Jan 1, 2025

eendebakpt commented Jan 1, 2025

ericsnowcurrently commented Jan 3, 2025

Fidget-Spinner commented Jan 3, 2025

corona10 commented Jan 3, 2025

corona10 commented Jan 3, 2025

Fidget-Spinner commented Jan 3, 2025

eendebakpt commented Dec 30, 2024 •

edited

Loading