-
-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rocm-6: bump packages to 6.3.1 and add missing packages #367695
base: master
Are you sure you want to change the base?
Conversation
76e05f1
to
c2abb37
Compare
not knowing much about the rocm stack, mainly here as i am using btop with rocm support enabled. I receive the following error when compiling
|
@Shawn8901 Should be fixed now, was broken when I first opened the PR. |
One thing that I've wanted to do for a long time is to completely remove the ROCm LLVM as an stdEnv, which should solve a lot of these weird compilation errors. Solus's ROCm stack does this, and we compile every non-HIP code with GCC. In this way, the entire ROCm LLVM can be compacted into a single derivation and the complexity of packaging/updating ROCm LLVM is drastically reduced. That is, you should be able to use just the default stdenv with GCC to compile non-HIP code and tell CMake/HIPCC to use the ROCm LLVM only when compiling HIP code. You can achieve this entirely through environment variables. It doesn't make sense that because a portion of the codebase contains HIP code, any C/C++ in the codebase needs to be compiled with ROCm LLVM's C compiler. |
Ok I just noticed the "Contemplate trying to make a normal Nix style CC wrapper work again" section, so it seems like you've already experienced the pain of the ROCm LLVM 😅 I will give my idea a try in the next few days and get back. |
It looks like upstream are moving away from a separate hipcc and using clang (now Maintaining a separate HIP only compiler might require maintaining significant cmakefile patches to get it to be used, but if you can work out a way to do this that isn't maintenance hell that's great. |
Please see https://github.com/GZGavinZhao/rocm-llvm-project/commits/solus-rocm-6.2.x for the patches and https://lists.debian.org/debian-ai/2024/12/msg00042.html for more details. I hope they apply cleanly on v6.3, but if not I think the changes are easy enough to manually rewrite them. If you need patches for other components, please see |
Solus does this and we didn't have to use any patches. Most of the work done was figuring out the environment variables to tell CMake and/or HIPCC what our intended HIP compiler is. The only thing I'm worrying about is locating sysroots due to non-standard installation prefix, but other than that Solus's experience shows that this is definitely doable. |
22f00e1
to
c05b8cb
Compare
This should be testable (including torch) if you have an instinct MI50 or newer, Radeon VII, W/RX 6800 or similar, W/RX 78/900 or similar. Cards which relied on the ISA compat patches like iGPUs or 66/7xx aren't going to work, haven't applied @GZGavinZhao's updated patches yet. If you're going to try to test it you want >200G free space so hipblaslt can spew an unfathomable amount of asm into the build temp directory and a lot of cores or you'll be here all month. |
I'm having trouble getting this to build. I get Failed Tests (2): during the triton-llvm-19.1.0-rc1 test phase. RX6800XT. X86_64-linux on nixos. No overlays or config or anything, just trying to create a devshell with python312Packages.torch. I can post a (nearly) minimum reproducible flake:
|
@henryrgithub Your test flake is using non-rocm torch which is broken in master with the same error.
If you use
Recommend updating inputs before starting a build of torchWithRocm, just fixed some issues. |
|
}; | ||
|
||
rocm-opencl-runtime = symlinkJoin { | ||
name = "rocm-opencl-runtime-meta"; | ||
# FIXME: we have compressed code objects now, may be able to skip two stages? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CCOB (compressed code object bundle) is on by default for composable_kernel for ROCm v6.2+, so yes no need for two-stage builds now.
I tried to use this PR via overlay
But got a collision when building ollama
But overall it seems to be building without errors. When not using ollama I was able to rebuild my system without errors. |
I got a separate HIP compiler working and successfully compiled |
ROCm's standard toolchain is clang + GNU libs including libstdc++. There are a few packages which don't compile with Clashing error is because I added a /llvm link to clr for use by other ROCm packages and ollama also already adds one to its ROCm env internally, can be resolved by dropping the /llvm link from ollama. |
Added exact GPU targets as pkgs.rocmPackages_6.gfx908, gfx1030 etc. |
This comment was marked as outdated.
This comment was marked as outdated.
|
If needed, you can also mark |
Finished building, I think. Ollama now recognizes my GPU, but produces nonsensical output compared to CPU generation. Something is still broken. Broken LLM weirdnessUsed my quantization of Mythalion-13b, a character role-playing model, to test, because that produces the weirdest results. System prompt instructs the model to role-play a character named Abigail. Prompt:
CPU:
GPU:
|
@vikanezrimaya Probably won't have anything on that ollama issue until Sunday, I'm swamped this week. If anyone else chooses to debug good luck! |
Fixes #337159
WIP bump to 6.3.1 for rocmPackages_6 package set.
Upstream PRs/issues Raised
TODO List
Add hipblaslt to open-webui buildInputsWas a torch dep, not direct.Things done
nix.conf
? (See Nix manual)sandbox = relaxed
sandbox = true
nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)Add a 👍 reaction to pull requests you find important.