Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lammps: Use cudaPackages.backendStdenv if needed #366528

Closed
wants to merge 1 commit into from

Conversation

doronbehar
Copy link
Contributor

Description of changes

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.11 Release Notes (or backporting 23.11 and 24.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@vcunat vcunat changed the base branch from staging-next to master December 23, 2024 14:02
@ofborg ofborg bot requested a review from costrouc December 24, 2024 03:34
@ofborg ofborg bot added 11.by: package-maintainer This PR was created by the maintainer of the package it changes 10.rebuild-darwin: 1-10 10.rebuild-linux: 1-10 labels Dec 24, 2024
@markuskowa
Copy link
Member

Does lammps even have CUDA support at moment (I don't see any in the inputs)?
I wonder if it would not be more efficient to find a solution at the level of the CUDA packages (e.g. cudaPackages could expose a compatible stdenv?). Every package that uses CUDA should have a similar problem.
In any case, I would suggest to make the pinning of stdenv conditional if CUDA is present or not.

@doronbehar
Copy link
Contributor Author

Does lammps even have CUDA support at moment (I don't see any in the inputs)?

Not at the moment, but I add it in my personal overlays, via the extraBuildInputs argument.

I wonder if it would not be more efficient to find a solution at the level of the CUDA packages (e.g. cudaPackages could expose a compatible stdenv?). Every package that uses CUDA should have a similar problem.

Yea I agree. Perhaps the @NixOS/cuda-maintainers will have an opinion on the matter. The only thing that is important to me is to help future usages of the package avoid the issue I encountered here: lammps/lammps#4420 . Perhaps making autoAddDriverRunpath error early if the compiler version it finds is too high would be best (we can worry about making it's inclusion in lammps conditional afterwards).

In any case, I would suggest to make the pinning of stdenv conditional if CUDA is present or not.

I agree with that too.

@markuskowa
Copy link
Member

Turns out there is already cudaPackages.backendStdenv which could be used here for CUDA builds.

@doronbehar
Copy link
Contributor Author

Turns out there is already cudaPackages.backendStdenv which could be used here for CUDA builds.

Nice. I think this resolves our request from the @NixOS/cuda-maintainers . Based on a pattern I saw with git grep cudaPackages.backendStdenv, I pushed something which I think would satisfy you too.

@github-actions github-actions bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux and removed 10.rebuild-darwin: 1-10 10.rebuild-linux: 1-10 labels Dec 25, 2024
@doronbehar doronbehar changed the title lammps: Use gcc12Stdenv because CUDA doesn't support GCC14 lammps: Use cudaPackages.backendStdenv if needed Dec 25, 2024
pkgs/by-name/la/lammps/package.nix Outdated Show resolved Hide resolved
pkgs/by-name/la/lammps/package.nix Outdated Show resolved Hide resolved
# Ease adding CUDA support. See:
# https://github.com/lammps/lammps/issues/4420
stdenv' =
if overrideStdenv && (config.cudaSupport || ((extraCmakeFlags.GPU_API or "") == "cuda")) then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case the "cudaSupport" is true do you want to also set the corresponding cmakeFlags and buildInputs so that lammps gets build automatically with CUDA (i.e. in a similar way we do it in openmpi)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case the "cudaSupport" is true do you want to also set the corresponding cmakeFlags and buildInputs so that lammps gets build automatically with CUDA (i.e. in a similar way we do it in openmpi)?

I think it makes the logic pretty complicated and I'm not sure I want to delve into implementing this. How many details do we want to be taken care of automatically? What do we want to happen in case someone simply sets config.cudaSupport? TBH, I don't like this flag at all, as it makes package maintainers' life pretty complicated, especially for complex packages such as this, and since the cuda support for every package is different and users should read the docs of such packages anyway and not simply count on config.cudaSupport to do the job...

Here's a list of scenarios I was able to think about that we should handle:

  # config.cudaSupport | packages.GPU | extraCmakeFlags.GPU_API || use cudaPackages.backendStdenv? || is valid?
  # ------------------ | ------------ | ----------------------- || ------------------------------- || ---------
  #      false         |    false     |         "opencl"        ||                                 ||
  #      true          |    false     |         "opencl"        ||                                 ||
  #      false         |    true      |         "opencl"        ||                                 ||
  #      true          |    true      |         "opencl"        ||                                 ||
  #      false         |    false     |         "cuda"          ||                                 ||
  #      true          |    false     |         "cuda"          ||                                 ||
  #      false         |    true      |         "cuda"          ||                                 ||
  #      true          |    true      |         "cuda"          ||                                 ||
  #      false         |    false     |         "hip"           ||                                 ||
  #      true          |    false     |         "hip"           ||                                 ||
  #      false         |    true      |         "hip"           ||                                 ||
  #      true          |    true      |         "hip"           ||                                 ||

Not only that, there are the cases where packages.GPU is not specified, and/or extraCmakeFlags.GPU_API is not specified... And, when opencl GPU_API is used, you should add the ocl-icd package to buildInputs, and cudaPackages.cudatoolkit when cuda GPU_API is used.

I tried for an hour or so to implement a logic that would satisfy everyone and I got lost in the sea of options above. Not only that, I decided to revert the last revision which read the value of config.cudaSupport, as I want the expression to be left clean from heuristic decisions that are hard to figure out just from reading the Nix code.

I guess this is the price we pay for the full configurability of extraCmakeFlags (and packages) and extraBuildInputs. I added a NOTE comment explaining this decision.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think understand the problem. However, if the config is too complex and if the intention is to configure it externally via extraCmakeFlags, I am non sure overrideStdenv is even needed here.
I that case you could override it accordingly and have more transparency:

lammps.override {
  stdenv = cudaPackages.backendStdenv;
  extraCmakeFlags.GPU_API = "cuda";
  <+ what else needs to be overriden>
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think understand the problem. However, if the config is too complex and if the intention is to configure it externally via extraCmakeFlags, I am non sure overrideStdenv is even needed here. I that case you could override it accordingly and have more transparency:

lammps.override {
  stdenv = cudaPackages.backendStdenv;
  extraCmakeFlags.GPU_API = "cuda";
  <+ what else needs to be overriden>
}

When you suggested adding an overrideStdenv, I thought you wanted to help the following case: A user wishes to build with -DGPU_API=cuda (and -DPKG_GPU=true), and not use the default stdenv, nor cudaPackages.backendStdenv, but something like clangStdenv (perhaps for testing the build or whatever). Without an overrideStdenv flag, this user would have no way to tell the build to use their supplied stdenv argument.

The discussion of having an overrideStdenv or not doesn't add or subtract from the complexity described in the above table.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-DGPU_API=cuda

Our overrideXXXX zoo is undeniably a source of infinite pain, but I think for now it's reasonable to say that in these cases "the user is on their own" and they should just use overrideAttrs to add -D flags manually.

The if cudaSupport then backendStdenv else stdenv is an unfortunate pattern since the user now has to .override { both stdenv = ...; and cudaPackages = ...; } to be sure their argument isn't silently ignored, and your overrideStdenv solution helps with that but not in a systematic way . Maybe we're better off without it and should build towards changing the callPackage mechanism instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here specifically when the user wants to override the stdenv but still pass the GPU flags. We can try to handle every conceivable combination, but afaiu this is getting complicated, and we'll need to keep maintaining the complexity.

My goal here is only to help Nix users of this package avoid the build error described here:

This build error is barely understandable, and hence it'd be nice to spare user's time by making the usage of both -DGPU_API="cuda" and -DPKG_GPU=ON CMake flags automatically trigger the usage of the cudaPackages.backendStdenv. I want to make this logic as less complicated as possible, and hence introducing config.cudaSupport to this logic game is way too much IMO.

Explaining the situation once more and also after the "systematicity" discussion, I think perhaps we can use an assertion that will check whether the stdenv.cc.version is lower then or equal to cudaPackages.backendStdenv.cc.version. With this logic, we'd spare the overrideStdenv argument, while still helping users avoid the experience of the build error mentioned above. What do you think about that?

Disclaimer: I may not have familiarized myself with the expression enough to jump to conclusions, I'm not sure if I'm being helpful

The expression is not that complicated at the moment, so I think this is not why I personally feel you are not being helpful. I feel you are not being helpful because it takes me too long to understand you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply. I do understand your motivation to handle the stdenv logic internally. It seems like an acceptable compromise for now.
For now the situation with CUDA is a global mess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now the situation with CUDA is a global mess.

I just thought of another idea that @NixOS/cuda-maintainers might like: Add a setup hook to cudaPackages.cudatoolkit that would prevent it from being compiled with a gcc --version lower then cudaPackages.backendStdenv.cc.version!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought of another idea that @NixOS/cuda-maintainers might like: Add a setup hook to cudaPackages.cudatoolkit that would prevent it from being compiled with a gcc --version lower then cudaPackages.backendStdenv.cc.version!

Yes, this conversation also made me think in this direction

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought of another idea that @NixOS/cuda-maintainers might like: Add a setup hook to cudaPackages.cudatoolkit that would prevent it from being compiled with a gcc --version lower then cudaPackages.backendStdenv.cc.version!

Yes, this conversation also made me think in this direction

OK Great, I think this should resolve the whole purpose of this PR. For reference:

Hence, I'm closing this.

@@ -44,9 +45,27 @@
extraCmakeFlags ? { },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true the semantics of config.cudaSupport is somewhat vague. I think my expectation would be something along the lines of extraCmakeFlags ? optionalAttrs config.cudaSupport { GPU_API = "cuda"; }, which would parallel how other packages accept cudaSupport ? config.cudaSupport

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, one could cut off some of the combinations of arguments using asserts...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true the semantics of config.cudaSupport is somewhat vague. I think my expectation would be something along the lines of extraCmakeFlags ? optionalAttrs config.cudaSupport { GPU_API = "cuda"; }, which would parallel how other packages accept cudaSupport ? config.cudaSupport

Yes but that won't be enough to make the build change anything. You also have to set packages.GPU to true (or CMake-wise: -DPKG_GPU=ON). See my table outside the review comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose

"the user is on their own" and they should just use overrideAttrs to add -D flags manually.

...is my stance on this, but it's obviously a compromise

@doronbehar
Copy link
Contributor Author

Please tell me how to fill this table and I'll consider implementing what you think makes sense:

config.cudaSupport packages.GPU extraCmakeFlags.GPU_API does it make sense? use cudaPackages.backendStdenv? CMake flags to set
false(default) false(default) opencl(default)
true false(default) opencl(default)
false(default) true opencl(default)
true true opencl(default)
false(default) false(default) cuda
true false(default) cuda
false(default) true cuda
true true cuda
false(default) false(default) hip
true false(default) hip
false(default) true hip
true true hip

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another observation along the same lines: .override { extraBuildInputs = as; extraCmakeFlags = bs; } is "just" .overrideAttrs (oldAttrs: { buildInputs = oldAttrs.buildInputs or [ ] ++ as; cmakeFlags = oldAttrs.cmakeFlags or [ ] ++ bs; })

Arguably, this need is very common in nixpkgs, which I think is evidenced by experiments like DavHau's "drv-parts" and amjoseph's "infuse" (both being more "systematic" solutions in the sense described in the other comment). Few packages in Nixpkgs have this kind of ad hoc helpers (referring to extraCmakeFlags).

So I wonder if we should keep maintaining these arguments and the associated complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably, this need is very common in nixpkgs, which I think is evidenced by experiments like DavHau's "drv-parts" and amjoseph's "infuse" (both being more "systematic" solutions in the sense described in the other comment). Few packages in Nixpkgs have this kind of ad hoc helpers (referring to extraCmakeFlags).

So I wonder if we should keep maintaining these arguments and the associated complexity.

I think people are waiting for structured attributes for the kind of features you are describing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I promised to stop posting confusing messages, feel free to dismiss this reply

Ah, right, un-structuredAttrs do make overrides harder, but I rather meant that even if you assume that lists are always lists and attrsets are always attrsets (instead of being strings in 10 derivations out of 100_000), the overrideAttrs approach still results in very deep expressions and breaks composability; https://discourse.nixos.org/t/infuse-nix-deep-override-attrs-generalizes-pipe-and-recursiveupdate/ and https://discourse.nixos.org/t/drv-parts-configure-packages-like-nixos-systems/23629 attempt to address this specifically, by exposing a declarative interface instead of the point-ful functional style

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I'll stop posting confusing messages. My opinion on behalf of cuda-maintainers is that the "officially" advertised way to use cuda in nixpkgs is to import nixpkgs { config.cudaSupport = true; } (because of transitive dependencies, it's safer to just re-evaluate the whole nixpkgs). So I would ask that lammps treats config.cudaSupport when choosing default values. It's perfectly consistent with the current semantics to ignore config.cudaSupport when the user explicitly passes (in callPackage or .override) something like GPU_API = "opencl", because then they explicitly opt out of the "defaults".

So in terms of the existing derivation, I think the behaviour I expect is something like this:

{
  config,
  packages ? {
    ...,
    GPU ? (extraCmakeFlags.GPU_API or null != null)
  },
  extraCmakeFlags ? {
    GPU_API ? if config.cudaSupport then "cuda" else if config.rocmSupport then "hip" else "opencl"
  },
  overrideStdenv ? if (extraCmakeFlags.GPU_API or null) == "cuda then cudaPackages.backendStdenv else stdenv,
  ...
}:
assert (extraCmakeFlags.GPU_API or null != null) -> (packages.GPU or false);
assert (extraCmakeFlags.GPU_API or null == "cuda") -> (overrideStdenv is compatible with nvcc);
...

Regarding the last (optional) assert, cudaPackages currently doesn't expose a function to test stdenv compatibility but maybe it should; the data is there in nvccCompatibleToolchains if I remember the attribute name correctly

@SomeoneSerge
Copy link
Contributor

Hence, I'm closing this.

Just to be clear, I think the work making lammps automatically use the right stdenv in simple known cases is still relevant, but you're the judge whether it's worth more of your time. Thank you for the PR and the discussion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux 11.by: package-maintainer This PR was created by the maintainer of the package it changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants