Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Move clang-riscv-rva23-evl-vec-2stage to annotated builder #374

Merged

Conversation

asb
Copy link
Contributor

@asb asb commented Feb 9, 2025

The intent is for the rest of the RISE RISC-V builders to follow, but this starts the process.

Previously the builders used the framework provided by ClangBuilder.py. As the build has required more customisation (with more to come - such as building the llvm-test-suite with the stage1 compiler which isn't support by ClangBuilder) this is requiring more and more changes to increase configurability of ClangBuilder, but this makes the logic harder to follow and I think we've reached the limit of what's reasonable.

The AnnotatedBuilder infrastructure allows the build logic to just be put in a shell script. The latest version of llvm-zorg is checked out and the script is executed. This has the additional advantage of decoupling script updates from needing to wait for buildmaster redeploys. With that in mind, I've avoided putting any configuration through options or environment variables defined by the builder instantiation in builders.py so there's no need to reason about cases where e.g. a script update was committed but the buildmaster may or may not have been updated with a new builders.py.

rise-riscv-build.sh is written with the philosophy that "A little copying is better than a little dependency." We could look to share more logic, but we'd end up with the same configurability issue as ClangBuilder. Instead, the script provides only the configurability needed for this set of builders.

A future improvement would be to make rise-riscv-build.sh more friendly to running outside of the buildbot environment (e.g. for local testing). This is left to future work, as it's not a regression vs the current approach.

This change has been tested locally along the lines of the instructions I wrote in
https://llvm.org/docs/HowToAddABuilder.html#testing-a-builder-config-locally. Something like the following is sufficient to use a local downstream testing llvm-zorg branch for the annotated builder scripts temporarily:

--- a/zorg/buildbot/builders/AnnotatedBuilder.py
+++ b/zorg/buildbot/builders/AnnotatedBuilder.py
@@ -77,6 +77,8 @@ def getAnnotatedBuildFactory(
     # Check out zorg so we can run the annotator scripts.
     f.addGetSourcecodeForProject(
         name='update-annotated-scripts',
+        repourl='file:///home/asb/llvm-zorg',
+        branch='synced-worktree',
         project='zorg',
         src_dir='llvm-zorg',
         alwaysUseLatest=True)

The intent is for the rest of the RISE RISC-V builders to follow, but
this starts the process.

Previously the builders used the framework provided by ClangBuilder.py.
As the build has required more customisation (with more to come - such
as building the llvm-test-suite with the stage1 compiler which isn't
support by ClangBuilder) this is requiring more and more changes to
increase configurability of ClangBuilder, but this makes the logic
harder to follow and I think we've reached the limit of what's
reasonable.

The AnnotatedBuilder infrastructure allows the build logic to just be
put in a shell script. The latest version of llvm-zorg is checked out
and the script is executed. This has the additional advantage of
decoupling script updates from needing to wait for buildmaster
redeploys. With that in mind, I've avoided putting any configuration
through options or environment variables defined by the builder
instantiation in builders.py so there's no need to reason about cases
where e.g. a script update was committed but the buildmaster may or may
not have been updated with a new builders.py.

rise-riscv-build.sh is written with the philosophy that "A little
copying is better than a little dependency." We could look to share more
logic, but we'd end up with the same configurability issue as
ClangBuilder. Instead, the script provides only the configurability
needed for this set of builders.

A future improvement would be to make rise-riscv-build.sh more friendly
to running outside of the buildbot environment (e.g. for local testing).
This is left to future work, as it's not a regression vs the current
approach.

This change has been tested locally along the lines of the instructions
I wrote in
<https://llvm.org/docs/HowToAddABuilder.html#testing-a-builder-config-locally>.
Something like the following is sufficient to use a local downstream
testing llvm-zorg branch for the annotated builder scripts temporarily:

```
--- a/zorg/buildbot/builders/AnnotatedBuilder.py
+++ b/zorg/buildbot/builders/AnnotatedBuilder.py
@@ -77,6 +77,8 @@ def getAnnotatedBuildFactory(
     # Check out zorg so we can run the annotator scripts.
     f.addGetSourcecodeForProject(
         name='update-annotated-scripts',
+        repourl='file:///home/asb/llvm-zorg',
+        branch='synced-worktree',
         project='zorg',
         src_dir='llvm-zorg',
         alwaysUseLatest=True)
```
@asb asb requested a review from lukel97 February 9, 2025 11:58
asb added a commit to asb/llvm-zorg that referenced this pull request Feb 9, 2025
Followup to llvm#374, but this time converting the RVA20 builder.
@lukel97
Copy link

lukel97 commented Feb 10, 2025

This seems reasonable to me

As the build has required more customisation (with more to come - such as building the llvm-test-suite with the stage1 compiler which isn't support by ClangBuilder)

Is this specifically with a two stage configuration? I thought that clang-aarch64-sve-vla was able to build llvm-test-suite with the stage1 compiler, and then it has a separate clang-aarch64-sve-vla-2stage builder to build it it with the stage2?

Is the plan to have clang-riscv-rva23-evl-vec-2stage do both a stage1 llvm-test-suite build and build a stage2 compiler?

@asb
Copy link
Contributor Author

asb commented Feb 10, 2025

This seems reasonable to me

As the build has required more customisation (with more to come - such as building the llvm-test-suite with the stage1 compiler which isn't support by ClangBuilder)

Is this specifically with a two stage configuration? I thought that clang-aarch64-sve-vla was able to build llvm-test-suite with the stage1 compiler, and then it has a separate clang-aarch64-sve-vla-2stage builder to build it it with the stage2?

Splitting builders would indeed be another way of handling this. I just mean that if you build stage1+stage2, the current ClangBuilder.py logic doesn't allow for building the llvm-test-suite using the stage1 compiler. This could be changed, but my overall view is making such changes is not worth it vs using the annotated builder approach to provide simple logic for the build I want.

Is the plan to have clang-riscv-rva23-evl-vec-2stage do both a stage1 llvm-test-suite build and build a stage2 compiler?

Yes - #376. Building and running the test suite is very fast, so as noted in that PR it would be interesting to move it to run the test suite immediately after the stage1 build so if it's broken we get that answer quickly (though would likely want to proceed to build + test stage2 even if there was a llvm-test-suite failure).

@lukel97
Copy link

lukel97 commented Feb 10, 2025

Building and running the test suite is very fast, so as noted in that PR it would be interesting to move it to run the test suite immediately after the stage1 build so if it's broken we get that answer quickly (though would likely want to proceed to build + test stage2 even if there was a llvm-test-suite failure).

If we wanted to test both stage1 and stage2, would it be easiest to split this out into two separate builders?

I'm not sure if this would result in an explosion of builders though.

AArch64 has clang-aarch64-{sve-vla,sve-vls,sve2-vla}{,-2stage}, so 6 in total.

For RISC-V we would end up with clang-riscv-{rva20,rva23,rva23-evl,rva23-mrvv}{,-2stage}, so 8 in total. I noticed we have dedicated workers for each builder whereas AArch64 is able to pool them, so maybe this isn't practical.

In any case I agree it's not worth trying to fight the ClangBuilder.py logic, so if this patch makes life easier I'm all for it

@asb
Copy link
Contributor Author

asb commented Feb 10, 2025

If we wanted to test both stage1 and stage2, would it be easiest to split this out into two separate builders?

It's definitely an option to consider, and much more feasible now workers no longer run "natively" (which meant before an rva20 and rva23 worker weren't interchangeable, as they were different qemu-system environments). Though if we just did stage1 and llvm-test-suite then the failure emails may be somewhat confusing due to not including LLVM unit test failures (i.e. a problem in llvm-test-suite may act as a red herring vs a much more obvious unit test failure). If we did stage1 + LLVM unit tests + llvm-test-suite then that's burning a fair bit more CPU for testing that's already covered by other builders.

After this series of patches the plan is to look at the best way of trading off the total CPU power we have vs testing coverage vs feedback time.

@asb asb merged commit 07cf32d into llvm:main Feb 10, 2025
2 checks passed
asb added a commit to asb/llvm-zorg that referenced this pull request Feb 10, 2025
Followup to llvm#374, but this time converting the RVA20 builder.
asb added a commit that referenced this pull request Feb 10, 2025
Followup to #374 (and stacks on top of it), but this time converting the
RVA20 builder.
@DavidSpickett
Copy link
Contributor

We on Linaro's toolchain team manage those AArch64 bots, here's some context that might help you.

AArch64 has clang-aarch64-{sve-vla,sve-vls,sve2-vla}{,-2stage}, so 6 in total.

For RISC-V we would end up with clang-riscv-{rva20,rva23,rva23-evl,rva23-mrvv}{,-2stage}, so 8 in total. I noticed we have > dedicated workers for each builder whereas AArch64 is able to pool them, so maybe this isn't practical.

We split them between:

  • SVE 1 only
  • SVE 1 and 2

I don't remember how many machines we have, plus we have to be careful because of Flang's RAM requirements. You are correct there are > 1 workers where each builder can run, and > 1 workers per machine.

Though this runs into problems because Buildbot's distribution of tasks isn't always good. If you have 3 requests queued up waiting to build on builder A, it can take all 3 free workers before builder B even gets a look in. If your build is pretty fast this isn't that much of a problem though. We have had extreme cases where a builder was delayed for days.

So I personally would start with 1:1 builder -> worker, and see how it goes. If your 1 worker is utilised a majority of the time anyway, it doesn't make any difference adding more. Also once you have > 1 on the same machine, you open yourself up to a whole new class of resource problems.

Keep in mind our motivations for splitting the configurations this way, they may or may not apply to you:

  • We were asked to specifically provide SVE2 testing that was distinct from SVE1. So if only that testing was red, it must be an SVE2 problem.
  • Similarly, we build single stage and 2 stage as different configurations because if only one is red we can learn something from that. If only 2 stage is red, you're looking for code generation problems (some of our 2 stage also run stage 1 but I'm not sure this is intentional).
  • Single stage builds report results more quickly also.

We are spending more to get faster results and higher clarity for developers. One day (I hope) SVE will be so standard that we can reduce the number of configurations (until the next unique thing of course).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants