No more auto-unrolling starting with LLVM 14 #94847

Urgau · 2022-03-11T13:58:50Z

Code

I tried this code:

pub fn find_minimum(values: &[f64]) -> f64 {
    let mut min = f64::INFINITY;

    for &num in values {
        if num < min {
            min = num
        }
    }

    min
}

I expected to see this happen: I expect that the generated assembly will unroll the loop to a certain level.

Instead, this happened: There is no more unrolling perform by the compiler.

Version it worked on

.LCPI0_0:
        .quad   0x7ff0000000000000
example::find_minimum:
        test    rsi, rsi
        je      .LBB0_1
        lea     rcx, [8*rsi - 8]
        mov     edx, ecx
        shr     edx, 3
        add     edx, 1
        and     rdx, 7
        je      .LBB0_3
        neg     rdx
        movsd   xmm1, qword ptr [rip + .LCPI0_0]
        mov     rax, rdi
.LBB0_5:
        movsd   xmm0, qword ptr [rax]
        add     rax, 8
        minsd   xmm0, xmm1
        movapd  xmm1, xmm0
        inc     rdx
        jne     .LBB0_5
        cmp     rcx, 56
        jae     .LBB0_7
        jmp     .LBB0_9
.LBB0_1:
        movsd   xmm0, qword ptr [rip + .LCPI0_0]
        ret
.LBB0_3:
        movsd   xmm0, qword ptr [rip + .LCPI0_0]
        mov     rax, rdi
        cmp     rcx, 56
        jb      .LBB0_9
.LBB0_7:
        lea     rcx, [rdi + 8*rsi]
.LBB0_8:
        movsd   xmm1, qword ptr [rax]
        minsd   xmm1, xmm0
        movsd   xmm0, qword ptr [rax + 8]
        minsd   xmm0, xmm1
        movsd   xmm1, qword ptr [rax + 16]
        minsd   xmm1, xmm0
        movsd   xmm0, qword ptr [rax + 24]
        minsd   xmm0, xmm1
        movsd   xmm1, qword ptr [rax + 32]
        minsd   xmm1, xmm0
        movsd   xmm0, qword ptr [rax + 40]
        minsd   xmm0, xmm1
        movsd   xmm1, qword ptr [rax + 48]
        minsd   xmm1, xmm0
        movsd   xmm0, qword ptr [rax + 56]
        minsd   xmm0, xmm1
        add     rax, 64
        cmp     rcx, rax
        jne     .LBB0_8
.LBB0_9:
        ret

It most recently worked on: 1.59

Version with regression

.LCPI0_0:
        .quad   0x7ff0000000000000
example::find_minimum:
        test    rsi, rsi
        je      .LBB0_1
        shl     rsi, 3
        movsd   xmm1, qword ptr [rip + .LCPI0_0]
        xor     eax, eax
.LBB0_4:
        movsd   xmm0, qword ptr [rdi + rax]
        minsd   xmm0, xmm1
        add     rax, 8
        movapd  xmm1, xmm0
        cmp     rsi, rax
        jne     .LBB0_4
        ret
.LBB0_1:
        movsd   xmm0, qword ptr [rip + .LCPI0_0]
        ret

rustc --version --verbose:

rustc 1.60.0-beta.3 (e5effbd0b 2022-03-07)
binary: rustc
commit-hash: e5effbd0b34c9ede216e21d3f60dcbad0b863676
commit-date: 2022-03-07
host: x86_64-unknown-linux-gnu
release: 1.60.0-beta.3
LLVM version: 14.0.0

@rustbot modify labels: +regression-from-stable-to-beta -regression-untriaged

The text was updated successfully, but these errors were encountered:

Urgau · 2022-03-11T13:59:16Z

searched nightlies: from nightly-2022-02-01 to nightly-2022-03-01
regressed nightly: nightly-2022-02-18
searched commit range: 75d9a0a...30b3f35
regressed commit: 30b3f35

bisected with cargo-bisect-rustc v0.6.1

Host triple: x86_64-unknown-linux-gnu
Reproduce with:

cargo bisect-rustc --test-dir=. --start=2022-02-01 --end=2022-03-01

SNCPlay42 · 2022-03-11T16:39:22Z

The assembly is actually not vectorized in either version (each version only operates on 8 bytes (1 f64) per minsd instruction); what's changed is the loop was unrolled before and isn't any more.

Urgau · 2022-03-11T16:40:35Z

I just checked and now rustc, clang and gcc are generating the same assembly code for the example.

The assembly is actually not vectorized in either version (each version only operates on 8 bytes (1 f64) per minsd instruction); what's changed is the loop was unrolled before and isn't any more.

Oh, you're right. My bad.

saethlin · 2022-03-11T17:36:09Z

-O for rustc is -Copt-level=2, and --release is -Copt-level=3. In clang and gcc I think it is equivalent to -O1 (at least that's what the gcc docs say) which I've never seen anyone use. You probably want to pass -Copt-level=3 in godbolt for Rust, and -O2 or -O3 for clang/gcc. -O3 is probably best if you're comparing to rustc codegen. But that's not relevant to whether we have a problem here, thank goodness.

I slapped together some extremely crude benchmarking on my 3970X and this looks like a 2x perf regression. If you replace f64 with u64 I see something closer to a 2.5x perf regression.

The perhaps good news is that if you pass -Ctarget-cpu=native on godbolt you get the old codegen back, and on my machine I see the perf regression disappear.

Also, it looks like this problem may be specific to computing the min of an array? That seems odd to me but maybe it means something to people who work on loop optimizations.

apiraino · 2022-03-16T19:05:13Z

Assigning priority as discussed in the Zulip thread of the Prioritization Working Group.

@rustbot label -I-prioritize +P-high

pnkfelix · 2022-03-31T14:46:11Z

Also, it looks like this problem may be specific to computing the min of an array? That seems odd to me but maybe it means something to people who work on loop optimizations.

I'd be curious to know if we regressed when you compute the min via other rustic-idioms like iterators.

In any case, we've been discussing this in the T-compiler meeting and we are thinking it isn't quite P-high priority, and perhaps should be P-medium. Its not clear what is the scope of kinds of loops that LLVM has started to fail to unroll; it seems somewhat limited.

saethlin · 2022-03-31T23:19:05Z

Yup, same behavior with Iterator::min. No unrolling and a ~2x regression by default, but we get the old unrolling and no regression with -Ctarget-cpu=native: https://godbolt.org/z/4Pnz6rcrc.

Urgau added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels Mar 11, 2022

rustbot added regression-from-stable-to-beta Performance or correctness regression from stable to beta. I-prioritize Issue: Indicates that prioritization has been requested for this issue. and removed regression-untriaged Untriaged performance or correctness regression. labels Mar 11, 2022

apiraino added I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 11, 2022

Urgau changed the title ~~No more auto-vectorization starting with LLVM 14~~ No more auto-unrolling starting with LLVM 14 Mar 11, 2022

rustbot added P-high High priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Mar 16, 2022

apiraino added P-medium Medium priority and removed P-high High priority labels Apr 6, 2022

pietroalbini added this to the 1.60.0 milestone Apr 6, 2022

pietroalbini added regression-from-stable-to-stable Performance or correctness regression from one stable version to another. and removed regression-from-stable-to-beta Performance or correctness regression from stable to beta. labels Jun 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No more auto-unrolling starting with LLVM 14 #94847

No more auto-unrolling starting with LLVM 14 #94847

Urgau commented Mar 11, 2022 •

edited

Loading

Urgau commented Mar 11, 2022

SNCPlay42 commented Mar 11, 2022

Urgau commented Mar 11, 2022

saethlin commented Mar 11, 2022 •

edited

Loading

apiraino commented Mar 16, 2022

pnkfelix commented Mar 31, 2022 •

edited

Loading

saethlin commented Mar 31, 2022 •

edited

Loading

No more auto-unrolling starting with LLVM 14 #94847

No more auto-unrolling starting with LLVM 14 #94847

Comments

Urgau commented Mar 11, 2022 • edited Loading

Code

Version it worked on

Version with regression

Urgau commented Mar 11, 2022

SNCPlay42 commented Mar 11, 2022

Urgau commented Mar 11, 2022

saethlin commented Mar 11, 2022 • edited Loading

apiraino commented Mar 16, 2022

pnkfelix commented Mar 31, 2022 • edited Loading

saethlin commented Mar 31, 2022 • edited Loading

Urgau commented Mar 11, 2022 •

edited

Loading

saethlin commented Mar 11, 2022 •

edited

Loading

pnkfelix commented Mar 31, 2022 •

edited

Loading

saethlin commented Mar 31, 2022 •

edited

Loading