-
-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lammps: Use cudaPackages.backendStdenv if needed #366528
Conversation
Does lammps even have CUDA support at moment (I don't see any in the inputs)? |
Not at the moment, but I add it in my personal overlays, via the
Yea I agree. Perhaps the @NixOS/cuda-maintainers will have an opinion on the matter. The only thing that is important to me is to help future usages of the package avoid the issue I encountered here: lammps/lammps#4420 . Perhaps making
I agree with that too. |
Turns out there is already |
c1d8ab7
to
0ed23a6
Compare
Nice. I think this resolves our request from the @NixOS/cuda-maintainers . Based on a pattern I saw with |
0ed23a6
to
161ccaa
Compare
161ccaa
to
c997276
Compare
pkgs/by-name/la/lammps/package.nix
Outdated
# Ease adding CUDA support. See: | ||
# https://github.com/lammps/lammps/issues/4420 | ||
stdenv' = | ||
if overrideStdenv && (config.cudaSupport || ((extraCmakeFlags.GPU_API or "") == "cuda")) then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case the "cudaSupport" is true do you want to also set the corresponding cmakeFlags
and buildInputs
so that lammps gets build automatically with CUDA (i.e. in a similar way we do it in openmpi)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case the "cudaSupport" is true do you want to also set the corresponding
cmakeFlags
andbuildInputs
so that lammps gets build automatically with CUDA (i.e. in a similar way we do it in openmpi)?
I think it makes the logic pretty complicated and I'm not sure I want to delve into implementing this. How many details do we want to be taken care of automatically? What do we want to happen in case someone simply sets config.cudaSupport
? TBH, I don't like this flag at all, as it makes package maintainers' life pretty complicated, especially for complex packages such as this, and since the cuda support for every package is different and users should read the docs of such packages anyway and not simply count on config.cudaSupport
to do the job...
Here's a list of scenarios I was able to think about that we should handle:
# config.cudaSupport | packages.GPU | extraCmakeFlags.GPU_API || use cudaPackages.backendStdenv? || is valid?
# ------------------ | ------------ | ----------------------- || ------------------------------- || ---------
# false | false | "opencl" || ||
# true | false | "opencl" || ||
# false | true | "opencl" || ||
# true | true | "opencl" || ||
# false | false | "cuda" || ||
# true | false | "cuda" || ||
# false | true | "cuda" || ||
# true | true | "cuda" || ||
# false | false | "hip" || ||
# true | false | "hip" || ||
# false | true | "hip" || ||
# true | true | "hip" || ||
Not only that, there are the cases where packages.GPU
is not specified, and/or extraCmakeFlags.GPU_API
is not specified... And, when opencl
GPU_API is used, you should add the ocl-icd
package to buildInputs
, and cudaPackages.cudatoolkit
when cuda
GPU_API is used.
I tried for an hour or so to implement a logic that would satisfy everyone and I got lost in the sea of options above. Not only that, I decided to revert the last revision which read the value of config.cudaSupport
, as I want the expression to be left clean from heuristic decisions that are hard to figure out just from reading the Nix code.
I guess this is the price we pay for the full configurability of extraCmakeFlags
(and packages
) and extraBuildInputs
. I added a NOTE
comment explaining this decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think understand the problem. However, if the config is too complex and if the intention is to configure it externally via extraCmakeFlags
, I am non sure overrideStdenv
is even needed here.
I that case you could override it accordingly and have more transparency:
lammps.override {
stdenv = cudaPackages.backendStdenv;
extraCmakeFlags.GPU_API = "cuda";
<+ what else needs to be overriden>
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think understand the problem. However, if the config is too complex and if the intention is to configure it externally via
extraCmakeFlags
, I am non sureoverrideStdenv
is even needed here. I that case you could override it accordingly and have more transparency:lammps.override { stdenv = cudaPackages.backendStdenv; extraCmakeFlags.GPU_API = "cuda"; <+ what else needs to be overriden> }
When you suggested adding an overrideStdenv
, I thought you wanted to help the following case: A user wishes to build with -DGPU_API=cuda
(and -DPKG_GPU=true
), and not use the default stdenv
, nor cudaPackages.backendStdenv
, but something like clangStdenv
(perhaps for testing the build or whatever). Without an overrideStdenv
flag, this user would have no way to tell the build to use their supplied stdenv
argument.
The discussion of having an overrideStdenv
or not doesn't add or subtract from the complexity described in the above table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-DGPU_API=cuda
Our overrideXXXX
zoo is undeniably a source of infinite pain, but I think for now it's reasonable to say that in these cases "the user is on their own" and they should just use overrideAttrs
to add -D
flags manually.
The if cudaSupport then backendStdenv else stdenv
is an unfortunate pattern since the user now has to .override {
both stdenv = ...;
and cudaPackages = ...; }
to be sure their argument isn't silently ignored, and your overrideStdenv
solution helps with that but not in a systematic way . Maybe we're better off without it and should build towards changing the callPackage
mechanism instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here specifically when the user wants to override the stdenv but still pass the GPU flags. We can try to handle every conceivable combination, but afaiu this is getting complicated, and we'll need to keep maintaining the complexity.
My goal here is only to help Nix users of this package avoid the build error described here:
This build error is barely understandable, and hence it'd be nice to spare user's time by making the usage of both -DGPU_API="cuda"
and -DPKG_GPU=ON
CMake flags automatically trigger the usage of the cudaPackages.backendStdenv
. I want to make this logic as less complicated as possible, and hence introducing config.cudaSupport
to this logic game is way too much IMO.
Explaining the situation once more and also after the "systematicity" discussion, I think perhaps we can use an assertion that will check whether the stdenv.cc.version
is lower then or equal to cudaPackages.backendStdenv.cc.version
. With this logic, we'd spare the overrideStdenv
argument, while still helping users avoid the experience of the build error mentioned above. What do you think about that?
Disclaimer: I may not have familiarized myself with the expression enough to jump to conclusions, I'm not sure if I'm being helpful
The expression is not that complicated at the moment, so I think this is not why I personally feel you are not being helpful. I feel you are not being helpful because it takes me too long to understand you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply. I do understand your motivation to handle the stdenv
logic internally. It seems like an acceptable compromise for now.
For now the situation with CUDA is a global mess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now the situation with CUDA is a global mess.
I just thought of another idea that @NixOS/cuda-maintainers might like: Add a setup hook to cudaPackages.cudatoolkit
that would prevent it from being compiled with a gcc --version
lower then cudaPackages.backendStdenv.cc.version
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just thought of another idea that @NixOS/cuda-maintainers might like: Add a setup hook to cudaPackages.cudatoolkit that would prevent it from being compiled with a gcc --version lower then cudaPackages.backendStdenv.cc.version!
Yes, this conversation also made me think in this direction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just thought of another idea that @NixOS/cuda-maintainers might like: Add a setup hook to cudaPackages.cudatoolkit that would prevent it from being compiled with a gcc --version lower then cudaPackages.backendStdenv.cc.version!
Yes, this conversation also made me think in this direction
OK Great, I think this should resolve the whole purpose of this PR. For reference:
Hence, I'm closing this.
c997276
to
0801929
Compare
0801929
to
cfd773a
Compare
@@ -44,9 +45,27 @@ | |||
extraCmakeFlags ? { }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true the semantics of config.cudaSupport
is somewhat vague. I think my expectation would be something along the lines of extraCmakeFlags ? optionalAttrs config.cudaSupport { GPU_API = "cuda"; }
, which would parallel how other packages accept cudaSupport ? config.cudaSupport
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, one could cut off some of the combinations of arguments using assert
s...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true the semantics of
config.cudaSupport
is somewhat vague. I think my expectation would be something along the lines ofextraCmakeFlags ? optionalAttrs config.cudaSupport { GPU_API = "cuda"; }
, which would parallel how other packages acceptcudaSupport ? config.cudaSupport
Yes but that won't be enough to make the build change anything. You also have to set packages.GPU
to true
(or CMake-wise: -DPKG_GPU=ON
). See my table outside the review comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose
"the user is on their own" and they should just use overrideAttrs to add -D flags manually.
...is my stance on this, but it's obviously a compromise
Please tell me how to fill this table and I'll consider implementing what you think makes sense:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another observation along the same lines: .override { extraBuildInputs = as; extraCmakeFlags = bs; }
is "just" .overrideAttrs (oldAttrs: { buildInputs = oldAttrs.buildInputs or [ ] ++ as; cmakeFlags = oldAttrs.cmakeFlags or [ ] ++ bs; })
Arguably, this need is very common in nixpkgs, which I think is evidenced by experiments like DavHau's "drv-parts" and amjoseph's "infuse" (both being more "systematic" solutions in the sense described in the other comment). Few packages in Nixpkgs have this kind of ad hoc helpers (referring to extraCmakeFlags
).
So I wonder if we should keep maintaining these arguments and the associated complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguably, this need is very common in nixpkgs, which I think is evidenced by experiments like DavHau's "drv-parts" and amjoseph's "infuse" (both being more "systematic" solutions in the sense described in the other comment). Few packages in Nixpkgs have this kind of ad hoc helpers (referring to
extraCmakeFlags
).So I wonder if we should keep maintaining these arguments and the associated complexity.
I think people are waiting for structured attributes for the kind of features you are describing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I promised to stop posting confusing messages, feel free to dismiss this reply
Ah, right, un-structuredAttrs do make overrides harder, but I rather meant that even if you assume that lists are always lists and attrsets are always attrsets (instead of being strings in 10 derivations out of 100_000), the overrideAttrs
approach still results in very deep expressions and breaks composability; https://discourse.nixos.org/t/infuse-nix-deep-override-attrs-generalizes-pipe-and-recursiveupdate/ and https://discourse.nixos.org/t/drv-parts-configure-packages-like-nixos-systems/23629 attempt to address this specifically, by exposing a declarative interface instead of the point-ful functional style
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I'll stop posting confusing messages. My opinion on behalf of cuda-maintainers is that the "officially" advertised way to use cuda in nixpkgs is to import nixpkgs { config.cudaSupport = true; }
(because of transitive dependencies, it's safer to just re-evaluate the whole nixpkgs). So I would ask that lammps
treats config.cudaSupport
when choosing default values. It's perfectly consistent with the current semantics to ignore config.cudaSupport
when the user explicitly passes (in callPackage
or .override
) something like GPU_API = "opencl"
, because then they explicitly opt out of the "defaults".
So in terms of the existing derivation, I think the behaviour I expect is something like this:
{
config,
packages ? {
...,
GPU ? (extraCmakeFlags.GPU_API or null != null)
},
extraCmakeFlags ? {
GPU_API ? if config.cudaSupport then "cuda" else if config.rocmSupport then "hip" else "opencl"
},
overrideStdenv ? if (extraCmakeFlags.GPU_API or null) == "cuda then cudaPackages.backendStdenv else stdenv,
...
}:
assert (extraCmakeFlags.GPU_API or null != null) -> (packages.GPU or false);
assert (extraCmakeFlags.GPU_API or null == "cuda") -> (overrideStdenv is compatible with nvcc);
...
Regarding the last (optional) assert, cudaPackages
currently doesn't expose a function to test stdenv compatibility but maybe it should; the data is there in nvccCompatibleToolchains
if I remember the attribute name correctly
Just to be clear, I think the work making |
Description of changes
Things done
nix.conf
? (See Nix manual)nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)Add a 👍 reaction to pull requests you find important.