[RISCV] Move clang-riscv-rva23-evl-vec-2stage to annotated builder #374

asb · 2025-02-09T11:58:42Z

The intent is for the rest of the RISE RISC-V builders to follow, but this starts the process.

Previously the builders used the framework provided by ClangBuilder.py. As the build has required more customisation (with more to come - such as building the llvm-test-suite with the stage1 compiler which isn't support by ClangBuilder) this is requiring more and more changes to increase configurability of ClangBuilder, but this makes the logic harder to follow and I think we've reached the limit of what's reasonable.

The AnnotatedBuilder infrastructure allows the build logic to just be put in a shell script. The latest version of llvm-zorg is checked out and the script is executed. This has the additional advantage of decoupling script updates from needing to wait for buildmaster redeploys. With that in mind, I've avoided putting any configuration through options or environment variables defined by the builder instantiation in builders.py so there's no need to reason about cases where e.g. a script update was committed but the buildmaster may or may not have been updated with a new builders.py.

rise-riscv-build.sh is written with the philosophy that "A little copying is better than a little dependency." We could look to share more logic, but we'd end up with the same configurability issue as ClangBuilder. Instead, the script provides only the configurability needed for this set of builders.

A future improvement would be to make rise-riscv-build.sh more friendly to running outside of the buildbot environment (e.g. for local testing). This is left to future work, as it's not a regression vs the current approach.

This change has been tested locally along the lines of the instructions I wrote in
https://llvm.org/docs/HowToAddABuilder.html#testing-a-builder-config-locally. Something like the following is sufficient to use a local downstream testing llvm-zorg branch for the annotated builder scripts temporarily:

--- a/zorg/buildbot/builders/AnnotatedBuilder.py
+++ b/zorg/buildbot/builders/AnnotatedBuilder.py
@@ -77,6 +77,8 @@ def getAnnotatedBuildFactory(
     # Check out zorg so we can run the annotator scripts.
     f.addGetSourcecodeForProject(
         name='update-annotated-scripts',
+        repourl='file:///home/asb/llvm-zorg',
+        branch='synced-worktree',
         project='zorg',
         src_dir='llvm-zorg',
         alwaysUseLatest=True)

The intent is for the rest of the RISE RISC-V builders to follow, but this starts the process. Previously the builders used the framework provided by ClangBuilder.py. As the build has required more customisation (with more to come - such as building the llvm-test-suite with the stage1 compiler which isn't support by ClangBuilder) this is requiring more and more changes to increase configurability of ClangBuilder, but this makes the logic harder to follow and I think we've reached the limit of what's reasonable. The AnnotatedBuilder infrastructure allows the build logic to just be put in a shell script. The latest version of llvm-zorg is checked out and the script is executed. This has the additional advantage of decoupling script updates from needing to wait for buildmaster redeploys. With that in mind, I've avoided putting any configuration through options or environment variables defined by the builder instantiation in builders.py so there's no need to reason about cases where e.g. a script update was committed but the buildmaster may or may not have been updated with a new builders.py. rise-riscv-build.sh is written with the philosophy that "A little copying is better than a little dependency." We could look to share more logic, but we'd end up with the same configurability issue as ClangBuilder. Instead, the script provides only the configurability needed for this set of builders. A future improvement would be to make rise-riscv-build.sh more friendly to running outside of the buildbot environment (e.g. for local testing). This is left to future work, as it's not a regression vs the current approach. This change has been tested locally along the lines of the instructions I wrote in <https://llvm.org/docs/HowToAddABuilder.html#testing-a-builder-config-locally>. Something like the following is sufficient to use a local downstream testing llvm-zorg branch for the annotated builder scripts temporarily: ``` --- a/zorg/buildbot/builders/AnnotatedBuilder.py +++ b/zorg/buildbot/builders/AnnotatedBuilder.py @@ -77,6 +77,8 @@ def getAnnotatedBuildFactory( # Check out zorg so we can run the annotator scripts. f.addGetSourcecodeForProject( name='update-annotated-scripts', + repourl='file:///home/asb/llvm-zorg', + branch='synced-worktree', project='zorg', src_dir='llvm-zorg', alwaysUseLatest=True) ```

Followup to llvm#374, but this time converting the RVA20 builder.

lukel97 · 2025-02-10T03:54:44Z

This seems reasonable to me

As the build has required more customisation (with more to come - such as building the llvm-test-suite with the stage1 compiler which isn't support by ClangBuilder)

Is this specifically with a two stage configuration? I thought that clang-aarch64-sve-vla was able to build llvm-test-suite with the stage1 compiler, and then it has a separate clang-aarch64-sve-vla-2stage builder to build it it with the stage2?

Is the plan to have clang-riscv-rva23-evl-vec-2stage do both a stage1 llvm-test-suite build and build a stage2 compiler?

asb · 2025-02-10T04:12:36Z

This seems reasonable to me

As the build has required more customisation (with more to come - such as building the llvm-test-suite with the stage1 compiler which isn't support by ClangBuilder)

Is this specifically with a two stage configuration? I thought that clang-aarch64-sve-vla was able to build llvm-test-suite with the stage1 compiler, and then it has a separate clang-aarch64-sve-vla-2stage builder to build it it with the stage2?

Splitting builders would indeed be another way of handling this. I just mean that if you build stage1+stage2, the current ClangBuilder.py logic doesn't allow for building the llvm-test-suite using the stage1 compiler. This could be changed, but my overall view is making such changes is not worth it vs using the annotated builder approach to provide simple logic for the build I want.

Is the plan to have clang-riscv-rva23-evl-vec-2stage do both a stage1 llvm-test-suite build and build a stage2 compiler?

Yes - #376. Building and running the test suite is very fast, so as noted in that PR it would be interesting to move it to run the test suite immediately after the stage1 build so if it's broken we get that answer quickly (though would likely want to proceed to build + test stage2 even if there was a llvm-test-suite failure).

lukel97 · 2025-02-10T04:29:36Z

Building and running the test suite is very fast, so as noted in that PR it would be interesting to move it to run the test suite immediately after the stage1 build so if it's broken we get that answer quickly (though would likely want to proceed to build + test stage2 even if there was a llvm-test-suite failure).

If we wanted to test both stage1 and stage2, would it be easiest to split this out into two separate builders?

I'm not sure if this would result in an explosion of builders though.

AArch64 has clang-aarch64-{sve-vla,sve-vls,sve2-vla}{,-2stage}, so 6 in total.

For RISC-V we would end up with clang-riscv-{rva20,rva23,rva23-evl,rva23-mrvv}{,-2stage}, so 8 in total. I noticed we have dedicated workers for each builder whereas AArch64 is able to pool them, so maybe this isn't practical.

In any case I agree it's not worth trying to fight the ClangBuilder.py logic, so if this patch makes life easier I'm all for it

asb · 2025-02-10T04:44:48Z

If we wanted to test both stage1 and stage2, would it be easiest to split this out into two separate builders?

It's definitely an option to consider, and much more feasible now workers no longer run "natively" (which meant before an rva20 and rva23 worker weren't interchangeable, as they were different qemu-system environments). Though if we just did stage1 and llvm-test-suite then the failure emails may be somewhat confusing due to not including LLVM unit test failures (i.e. a problem in llvm-test-suite may act as a red herring vs a much more obvious unit test failure). If we did stage1 + LLVM unit tests + llvm-test-suite then that's burning a fair bit more CPU for testing that's already covered by other builders.

After this series of patches the plan is to look at the best way of trading off the total CPU power we have vs testing coverage vs feedback time.

Followup to llvm#374, but this time converting the RVA20 builder.

Followup to #374 (and stacks on top of it), but this time converting the RVA20 builder.

DavidSpickett · 2025-02-10T10:23:38Z

We on Linaro's toolchain team manage those AArch64 bots, here's some context that might help you.

AArch64 has clang-aarch64-{sve-vla,sve-vls,sve2-vla}{,-2stage}, so 6 in total.

For RISC-V we would end up with clang-riscv-{rva20,rva23,rva23-evl,rva23-mrvv}{,-2stage}, so 8 in total. I noticed we have > dedicated workers for each builder whereas AArch64 is able to pool them, so maybe this isn't practical.

We split them between:

SVE 1 only
SVE 1 and 2

I don't remember how many machines we have, plus we have to be careful because of Flang's RAM requirements. You are correct there are > 1 workers where each builder can run, and > 1 workers per machine.

Though this runs into problems because Buildbot's distribution of tasks isn't always good. If you have 3 requests queued up waiting to build on builder A, it can take all 3 free workers before builder B even gets a look in. If your build is pretty fast this isn't that much of a problem though. We have had extreme cases where a builder was delayed for days.

So I personally would start with 1:1 builder -> worker, and see how it goes. If your 1 worker is utilised a majority of the time anyway, it doesn't make any difference adding more. Also once you have > 1 on the same machine, you open yourself up to a whole new class of resource problems.

Keep in mind our motivations for splitting the configurations this way, they may or may not apply to you:

We were asked to specifically provide SVE2 testing that was distinct from SVE1. So if only that testing was red, it must be an SVE2 problem.
Similarly, we build single stage and 2 stage as different configurations because if only one is red we can learn something from that. If only 2 stage is red, you're looking for code generation problems (some of our 2 stage also run stage 1 but I'm not sure this is intentional).
Single stage builds report results more quickly also.

We are spending more to get faster results and higher clarity for developers. One day (I hope) SVE will be so standard that we can reduce the number of configurations (until the next unique thing of course).

asb requested a review from lukel97 February 9, 2025 11:58

asb added a commit to asb/llvm-zorg that referenced this pull request Feb 9, 2025

[RISCV] Move clang-riscv-rva20-2stage to annotated builder

b12c511

Followup to llvm#374, but this time converting the RVA20 builder.

asb mentioned this pull request Feb 9, 2025

[RISCV] Move clang-riscv-rva20-2stage to annotated builder #375

Merged

lukel97 approved these changes Feb 10, 2025

View reviewed changes

asb merged commit 07cf32d into llvm:main Feb 10, 2025
2 checks passed

asb added a commit to asb/llvm-zorg that referenced this pull request Feb 10, 2025

[RISCV] Move clang-riscv-rva20-2stage to annotated builder

15cfa5c

Followup to llvm#374, but this time converting the RVA20 builder.

asb added a commit that referenced this pull request Feb 10, 2025

[RISCV] Move clang-riscv-rva20-2stage to annotated builder (#375)

16fd2d2

Followup to #374 (and stacks on top of it), but this time converting the RVA20 builder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] Move clang-riscv-rva23-evl-vec-2stage to annotated builder #374

[RISCV] Move clang-riscv-rva23-evl-vec-2stage to annotated builder #374

asb commented Feb 9, 2025

lukel97 commented Feb 10, 2025

asb commented Feb 10, 2025

lukel97 commented Feb 10, 2025

asb commented Feb 10, 2025

DavidSpickett commented Feb 10, 2025

[RISCV] Move clang-riscv-rva23-evl-vec-2stage to annotated builder #374

[RISCV] Move clang-riscv-rva23-evl-vec-2stage to annotated builder #374

Conversation

asb commented Feb 9, 2025

lukel97 commented Feb 10, 2025

asb commented Feb 10, 2025

lukel97 commented Feb 10, 2025

asb commented Feb 10, 2025

DavidSpickett commented Feb 10, 2025