Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release: forbid use of lib.fileset in Nixpkgs #369694

Merged
merged 13 commits into from
Jan 1, 2025

Conversation

emilazy
Copy link
Member

@emilazy emilazy commented Dec 31, 2024

Due to Nix bug NixOS/nix#11503, builtins.filterSource and chroot stores interact in a confusing and broken way that breaks lib.fileset. This means that uses of the API inside Nixpkgs keep breaking the NixOS installer, blocking the channel. The resulting error messages are inscrutable (they look like “the installer test is trying to download curl…?” and eventually bottom out in a derivation that has the wrong outPath because of the chroot store causing an incorrect lib.fileset result).

Whenever this happens, someone (well, in practice K900 or I) has to bisect the change that introduced it and remove the use of lib.fileset. This has happened at least three times in the past four months (I believe I might actually be missing one here, but these are the ones I remember and could easily dig up):

The options I see here are:

  1. Forbid use of lib.fileset within Nixpkgs until the Nix bug is fixed. This is the approach taken here. External users of Nixpkgs can continue to use the API as normal, but using it from within something that affects any release jobset outPaths will cause an evaluation failure with a hopefully‐helpful error message.

  2. Forbid lib.fileset and also all of the other library APIs that use builtins.filterSource. I’m happy to do this, but so far none of those have broken the installer, so I decided to start small and worry about the others if they end up causing a problem in practice.

  3. Forbid builtins.filterSource directly. This is hard and would require more invasive builtins.scopedImport crimes to do at evaluation time. I think this would realistically have to be done in something like nixpkgs-vet instead and I didn’t have much luck shoehorning a check like this into that codebase when I tried.

  4. Fix the Nix bug. This would be great! But also it doesn’t seem to be happening any time soon, it seems difficult to fix in a way that doesn’t subtly break compatibility with the previous semantics, and arguably the fix would need backporting all the way back to 2.3 given our minimum version policy.

  5. Do nothing; have people continue to innocuously use lib.fileset throughout Nixpkgs, breaking the installer whenever one of them sneaks in to that closure, causing the channel to be blocked and requiring expensive bisections to narrow down the inscrutable test failure to the package using lib.fileset, which then needs moving back off it. This sucks for the people who keep having to track it down, holds back important channel bumps, and the criteria for when it’s okay to use lib.fileset are not realistically possible to teach to all contributors.

I'd be happy to work on (2) as an alternative; (3) would be difficult and seems like overkill, (4) is not really something I trust myself to do and wouldn’t address the immediate problem, and (5) isn’t sustainable. I think that the current approach here is the best trade‐off for now, as lib.fileset seems to be the only prominent user of the builtins.filterSource API that works with full store paths, exposing it to the Nix bug. It’s unfortunate to lose the nice API, but since we can’t rely on it to produce correct results and the channels keep getting blocked as a result, I don’t think we really have an alternative right now.


I have checked that nix-build ci -A eval.full can successfully evaluate the release jobset for all the Hydra platforms and that it catches all the existing uses that I’ve fixed in this PR. I have also checked that the packages I’ve touched here continue to build if they did already, though it’s of course possible I’ve done something wrong when migrating them away from lib.fileset, so reviews of those commits would be helpful.

Result of nixpkgs-review run on aarch64-linux 1

1 package blacklisted:
  • nixos-install-tools
12 packages built:
  • cudaPackages.saxpy
  • cudaPackages_11.saxpy
  • flattenReferencesGraph
  • flattenReferencesGraph.dist
  • nixpkgs-manual
  • purescm
  • python312Packages.waitress-django
  • python312Packages.waitress-django.dist
  • python313Packages.waitress-django
  • python313Packages.waitress-django.dist
  • shopify-cli
  • yarn2nix

nixpkgs-review result

Generated using nixpkgs-review.

Command: nixpkgs-review


x86_64-linux

⏩ 1 package blacklisted:
  • nixos-install-tools
✅ 12 packages built:
  • cudaPackages.saxpy
  • cudaPackages_11.saxpy
  • flattenReferencesGraph
  • flattenReferencesGraph.dist
  • nixpkgs-manual
  • purescm
  • python312Packages.waitress-django
  • python312Packages.waitress-django.dist
  • python313Packages.waitress-django
  • python313Packages.waitress-django.dist
  • shopify-cli
  • yarn2nix

nixpkgs-review result

Generated using nixpkgs-review.

Command: nixpkgs-review


aarch64-darwin

❌ 2 packages failed to build:
  • flattenReferencesGraph
  • flattenReferencesGraph.dist
✅ 8 packages built:
  • nixpkgs-manual
  • purescm
  • python312Packages.waitress-django
  • python312Packages.waitress-django.dist
  • python313Packages.waitress-django
  • python313Packages.waitress-django.dist
  • shopify-cli
  • yarn2nix

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 25.05 Release Notes (or backporting 24.11 and 25.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

I am not sure `nix-gitignore.gitignoreSource` fares any better
with chroot stores, but so far it’s always been `lib.fileset`
that has caused issues with the installer ISO, and a blanket ban on
`lib.fileset` seems like the simplest remedy unless it turns out that
one of the other source filtering helpers causes issues.

This could probably be omitted entirely as it’s just an aid for
development convenience, anyway.

This reverts commit ba5a5fa.
This reverts commit 9ae0f77.
I don’t love this, and I’m not convinced it doesn’t have the
same pitfalls as filesets, but see the `flattenReferencesGraph`
commit for reasoning.

It may be better to ban the relevant builtins entirely and just move
things into subdirectories when needed, but I didn’t want to do
more surgery to this part of the tree than necessary to solve the
immediate problem.
Due to Nix bug <NixOS/nix#11503>,
`builtins.filterSource` and chroot stores interact in a confusing
and broken way that breaks `lib.fileset`. This means that uses of
the API inside Nixpkgs keep breaking the NixOS installer, blocking
the channel. The resulting error messages are inscrutable (they look
like “the installer test is trying to download `curl`…?” and
eventually bottom out in a derivation that has the wrong `outPath`
because of the chroot store causing an incorrect `lib.fileset` result).

Whenever this happens, someone (well, in practice K900 or I)
has to bisect the change that introduced it and remove the use of
`lib.fileset`. This has happened at least three times in the past
four months (I believe I might actually be missing one here, but
these are the ones I remember and could easily dig up):

* <NixOS#340046>
* <NixOS#352491>
* <NixOS#369459>

The options I see here are:

1. Forbid use of `lib.fileset` within Nixpkgs until the Nix bug is
   fixed. This is the approach taken here. External users of Nixpkgs
   can continue to use the API as normal, but using it from within
   something that affects any release jobset `outPath`s will cause an
   evaluation failure with a hopefully‐helpful error message.

2. Forbid `lib.fileset` and also all of the other library APIs that use
   `builtins.filterSource`. I’m happy to do this, but so far none of
   those have broken the installer, so I decided to start small and
   worry about the others if they end up causing a problem in practice.

3. Forbid `builtins.filterSource` directly. This is hard and would
   require more invasive `builtins.scopedImport` crimes to do at
   evaluation time. I think this would realistically have to be done in
   something like nixpkgs-vet instead and I didn’t have much luck
   shoehorning a check like this into that codebase when I tried.

4. Fix the Nix bug. This would be great! But also it doesn’t seem to be
   happening any time soon, it seems difficult to fix in a way that
   doesn’t subtly break compatibility with the previous semantics, and
   arguably the fix would need backporting all the way back to 2.3
   given our minimum version policy.

5. Do nothing; have people continue to innocuously use `lib.fileset`
   throughout Nixpkgs, breaking the installer whenever one of them
   sneaks in to that closure, causing the channel to be blocked and
   requiring expensive bisections to narrow down the inscrutable test
   failure to the package using `lib.fileset`, which then needs moving
   back off it. This sucks for the people who keep having to track it
   down, holds back important channel bumps, and the criteria for when
   it’s okay to use `lib.fileset` are not realistically possible to
   teach to all contributors.

I'd be happy to work on (2) as an alternative; (3) would be difficult
and seems like overkill, (4) is not really something I trust myself
to do and wouldn’t address the immediate problem, and (5) isn’t
sustainable. I think that the current approach here is the best
trade‐off for now, as `lib.fileset` seems to be the only prominent
user of the `builtins.filterSource` API that works with full store
paths, exposing it to the Nix bug. It’s unfortunate to lose the
nice API, but since we can’t rely on it to produce correct results
and the channels keep getting blocked as a result, I don’t think
we really have an alternative right now.
@emilazy emilazy added the backport release-24.11 Backport PR automatically label Dec 31, 2024
@emilazy emilazy requested review from K900 and infinisil December 31, 2024 15:06
@github-actions github-actions bot added 6.topic: python 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 6.topic: haskell 8.has: documentation This PR adds or changes documentation 6.topic: nodejs 6.topic: testing Tooling for automated testing of packages and modules 6.topic: cuda Parallel computing platform and API labels Dec 31, 2024
@philiptaron
Copy link
Contributor

Hey, this bit Felix (@Stunkymonkey)! Many of the lines changed in this patch were introduced to fix #301014.

The bug explanation makes it now make a little more sense, but it's still crazy to me that it both has this symptom and is so action-at-a-distance.

@emilazy
Copy link
Member Author

emilazy commented Dec 31, 2024

Yeah, it’s really nasty to run into, and it keeps happening; someone uses a normal‐looking API in a random package and then days later we find out that the installer test is trying to download a bunch of stuff for no obvious reason. The problem is that Nix is really bad at keeping track of which chroot store paths should be mapped transparently to /nix/store/… and which should have the full on‐disk path, and that leaks into the builtins.filterSource interface. Non‐lib.fileset users of the API tend to get lucky by not looking at the full path, so the inconsistent prefix doesn’t affect them. It may be possible to work around this in the filesets API but @infinisil wasn’t optimistic about being able to do that without making correctness worse when I asked, so I think this is the best way to stop the bleeding for now.

(FWIW, the tell‐tale sign – other than just “someone is using lib.fileset” – is that with a chroot store (actually it’s a little more elaborate, I think you need to be doing -f '<nixpkgs>' where that points either inside or outside of your chroot store, one of which confuses Nix) the computed src ends up incorrectly being an empty directory.)

@philiptaron
Copy link
Contributor

I'd like to hear from you or @infinisil about the impact on lib.fileset for out-of-tree users. Based on the comment above, I think it'll be rare to run into this issue, since invoking Nix with -f '<nixpkgs>' isn't that common in workflows I've seen (most of which use flakes.)

This is the tracking issue where lib.fileset work happens, btw:

Copy link
Member

@infinisil infinisil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunate that this is necessary, but I agree that this is the best workaround for now while we don't have a fix for the underlying bug. Thanks for pushing this!

I'm not concerned about out-of-tree uses, it indeed shouldn't happen often.

@emilazy
Copy link
Member Author

emilazy commented Dec 31, 2024

The way this PR is done, it’ll only affect GHA / ofborg / Hydra, not external importers of Nixpkgs, since we stub out lib.fileset in the files that handle the release jobsets. In other words, downstream importers of Nixpkgs will get lib.fileset as normal, but our various CI builders will complain if they’re used in ways that could break anything we build. That’s also why some in‐tree stuff that we don’t build, like passthru tests of various derivations, continues to use lib.fileset – if any of that ever gets pulled into a release jobset outPath it’ll start complaining. (Of course it’ll continue to be subject to the Nix bug wherever you use it, but probably most people aren’t using chroot stores.)

This PR is definitely one I’d like to wait for the ofborg eval results on, just in case :) (and to see the perf stats, although it should only be one lib.extend call per Nixpkgs evaluation.)

@emilazy
Copy link
Member Author

emilazy commented Dec 31, 2024

Oh, actually ofborg just died, so maybe we should just merge this.

@wegank wegank added the 12.approvals: 2 This PR was reviewed and approved by two reputable people label Jan 1, 2025
Copy link
Member

@winterqt winterqt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this looks like the best solution for now (unfortunately).

Let's hope the Nix bug maybe gets fixed soon?

@infinisil infinisil merged commit 82d084b into NixOS:master Jan 1, 2025
27 of 28 checks passed
@nix-backports
Copy link

nix-backports bot commented Jan 1, 2025

Backport failed for release-24.11, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin release-24.11
git worktree add -d .worktree/backport-369694-to-release-24.11 origin/release-24.11
cd .worktree/backport-369694-to-release-24.11
git switch --create backport-369694-to-release-24.11
git cherry-pick -x 3c65f3d84651fd1b120db0d26bc140f6275aafec 2eae7d63e29c74afb9bca17746528dc62ba5486d f6ce575a03505ef90dbb864009058ac8d1392c3f 6ff076b1b121970ed13a38ffee0f58b540d417e8 f6cc18e0e46a4d3c9cace01eb698ac9c26cbed78 77def225cb60ae7c35ae9c82034aababed016f50 44475aafe3c1f67f7a1f304632c9271c61589a02 ba42e09bf2dca362e84500f1169b921164bbb8de 4e30014c062cec5059552c0dfc4a4656d7d59dd4 34a6c331e79081db4d133fc6692105552bbeb52b bc419f5949c49e1779da9ffecb201e42fc819b71 a38901aa8539613589598f59f567b9001199badd 8725e466ef2bcc5be69106c16dbf69b9ef989273

@emilazy emilazy deleted the push-quvxmoouztpo branch January 1, 2025 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: cuda Parallel computing platform and API 6.topic: haskell 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 6.topic: nodejs 6.topic: python 6.topic: testing Tooling for automated testing of packages and modules 8.has: documentation This PR adds or changes documentation 10.rebuild-darwin: 11-100 10.rebuild-linux: 11-100 12.approvals: 2 This PR was reviewed and approved by two reputable people backport release-24.11 Backport PR automatically
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants