-
-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC 0152] local-overlay
store
#152
base: master
Are you sure you want to change the base?
Conversation
Co-authored-by: Ryan Mulligan <[email protected]> Co-authored-by: Connor Brewster <[email protected]> Co-authored-by: Ben Radford <[email protected]> Co-authored-by: Divam <[email protected]>
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/super-colliding-nix-stores/28462/17 |
The `local-overlay` store can serve as a crucial tool to bridge these two modes of using Nix. | ||
The lower store can be as before | ||
--- however the artifacts were disseminated in the "hidden Nix" first phase of adoption | ||
--- perhaps with only a small tweak to expose the DB / daemon socket if it wasn't before. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is exposing the socket used anywhere in the proposal, or is it just mentioned as a separate possibility (with relevant metadata sharing done via reading the underlying SQLite DB)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The local overlay store is potentially a consumer of the socket provided by another Nix daemon. A Nix daemon can also be spun up using the local overlay store instead of the local store.
Basically, no new socket code is needed for this. As far as I can tell, everything one would want with sockets already works without limitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A consumer — doing what with the socket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing regular client, like anything else using the daemon. It will in fact only use it to read metadata; the lower store can be read-only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reread the paragraph, and even with your explanations I am not sure what process the paragraph as written describes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Server Nix store dir, clients don't use Nix but do use Nix-built things
- Serve either socket or SQLite database, clients can use Nix with that store but with restrictions (e.g. perhaps only read-only)
- Use local overlay store (the writable upper layer makes the read-only lower layer less of an issue)
Doe that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in the step 1 each snapshot gets its own store path, and the rest of the store is visible? (I guess one could also share a fixed-path directory with links to all the currently-relevant snapshots?)
What if, instead of requiring that the lower store grow monotonically, the overlay store maintained GC roots in the lower for any lower paths it references? I haven't thought this through to the level of detail in the spec, but it strikes me as an alternative worth considering; not being able to run garbage collection on the lower store would certainly be a barrier for personal use (perhaps not a central goal for this RFC) and possibly for org use as well—storage is cheap but it doesn't round down to free in every context. |
I work for Replit (who sponsored work on this RFC) and helped out with the creation of this RFC. I am happy to serve as a Shepherd on this RFC, but also happy to cheer from the sidelines if people (or the Steering Committee) see this as a conflict of interest. |
I'm generally excited about this. I have a slightly different target than Replit has. This would solve the initial import of the DB which is still not that fast (although probably way faster than importing 16TB worth of derivations!). |
@rhendric That is useful for some things, but probably not the use-case of large numbers of consumers all sharing the same underlying store --- it is pretty important the underlying store be truely read only in that case, including any GC roots. |
@baloo We have separately thought about those sorts of issues, including a persistent upper store that then "pivots" onto a new lower store when one does can upgrade of NixOS (and I suppose GC of the old generations). I think the pivoting feature is a nice future work item. |
This RFC is now open for shepherd nominations! |
I nominate @roberth |
Now this makes me wonder if |
I suppose would deprecate all "local" and "remote" as not being misleading names. I just picked |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/tweag-nix-dev-update-51/30870/1 |
This RFC has not acquired enough shepherds. This typically shows lack of interest from the community. In order to progress a full shepherd team is required. Consider trying to raise interest by posting in Discourse, talking in Matrix or reaching out to people that you know. If not enough shepherds can be found in the next month we will close this RFC until we can find enough interested participants. The PR can be reopened at any time if more shepherd nominations are made. |
@baloo, @rickynils, @zhaofengli, @arianvp, @edolstra would any of you be open to shepherding this along? I don't see much controversy or drama here and, in my opinion, the document is in good shape too, so overall I expect the shepherd work to be low-commitment. If you've never done it before, you can look at https://github.com/NixOS/rfcs/blob/master/rfcs/0036-rfc-process-team-amendment.md#shepherd-team for more information about being a Shepherd. |
I can Shepard |
I also poked the #nixos-systemd channel. As some people are looking at Appliance images / immutable |
|
||
We could have a single FUSE mount that could manually implement the "bind on demand" semantics described above without cluttering the mount namespace with an entry per each shared store object. | ||
FUSE however is quite primitive, in that every read need to be shuffled via the FUSE server. | ||
There is nothing like a "control plane vs data plane" separation where Nix could tell the OS "this directory is that other directory", and the OS can do the rest without involving the FUSE server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is false, FUSE passthrough can lead to bypass the FUSE server.
There are multiple folks in the Linux filesystem ecosystem that are working on bringing FUSE ecosystems with a relatively on-par performance with in-kernel filesystem or reuse the existing filesystems.
https://lwn.net/Articles/932060/
https://github.com/extfuse/extfuse
https://lpc.events/event/16/contributions/1339/
I am not really convinced of not pursuing this alternative as I feel like this bring the maximum flexibility and compatibility for all the usecases instead of making it a very limited thing based on OverlayFS.
If you are interested into chatting more on how to make this alternative possible, feel free to ping me, I know quite about FUSE filesystems and I am planning to write a nixstorefs at some point, which will rely on FUSE semantics first.
I also challenge the "worse performance" than in-kernel mounting solutions, it would be good to bring data on that, if you only pay the open
cost, this is quite cheap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, I have mixed feelings about the RFC because I feel like the FUSE approach is a much better route than this one, I can say that I am not satisfied and feel like it should be more ambitious if it is going to take a RFC route.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that is fantastic! That is getting at the heart of the problems with FUSE. I wish I had known about this earlier.
However, even if I did, I would have advocated starting with the OverlayFS approach. That is because all these things are not yet mainline, and kernel development / trying out bleeding edge features is vastly more expensive.
IMO the right thing to do is
- Accept what we have on an experimental basis, using it to drum up interests and move us towards being able to pool resources on this.
- Get in communication with these Kernel devs; indeed I was already emailing back and forth with Amir Goldstein about some restrictions in OverlayFS.
- Try to be an early adopter of this stuff as it matures; nudge its development so it better meets our needs.
Also CC @flokli, because Tvix may be better positioned to be at the vanguard of trying this stuff out, as they are already exploring FUSE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disclosure: I am also a tvix-store developer ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, your argument is sound and convinced me :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my bad I didn't realize that. @RaitoBezarius you should shepherd this :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aaaaaaaaaaa, OK for the nerdsnipe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be another approach with FUSE : using MergerFS with symlinkify=true
and cache.symlinks=true
. This way, FUSE is only used to resolve the symlink:
- for a lot of applications, the opened directory fd will be reused, so most operations will run directly on the underlying store
- later uses of the symlink would be cached by the kernel, so even without fuse-ebpf, we'd need the jump to userspace only once per symlink
This also relies on lower stores only growing monotonically when used, so that links would not go stale.
There is some more context provided by a developer in this thread in the linux-fsdevel mailing list :
It doesn't say anything about adding directories/files in lower that already exist in upper layers, though. Also, someone else then adds this comment about allowing changes to the lower fs :
Maybe we can reach out to linux-fsdevel and describe our use case. If we only require "extending" the lower filesystem online, it's possible that it wouldn't require a lot of changes. |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/nix-in-the-wild-project-idx-flox-blog/35025/2 |
local-overlay
storelocal-overlay
store
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/sharing-nix-portable-store-between-multiple-users/36571/5 |
@tomberek any chance the shepherd team could have a meeting sometime soon? Some of @ballit6782's concerns don't seem to have been responded to and it would be good to have these addressed in some way so the RFC can move forwards :) |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: |
Another ping, please give an update on the state of this RFC, if any. If no progress is happening, it would be best if RFC is marked as a draft. |
I think that for this RFC to progress, one of these things need to happen :
I think that options 2 and 3 are not viable (frozen upper stores would be largely useless). On the subject of FUSE alternatives, I think that this paragraph in the RFC does not take into account the caching mechanisms of Linux's VFS :
For example, if using MergerFS with the options MergerFS also opens up other possibilities that are very interesting for my use-case of the local-overlay store, namely providing "late persistence", by adding a tmpfs branch that can be dynamically shrunk to 0. And being able to set I have planned since some time to make some complete tests of this in practice, but had not found the time nor motivation recently. However, if the shepherds think that this is a viable path for this RFC, I can document some experiments, run some benchmarks (especially measuring the additional context switches and how they scale) and try to provide an updated version of the RFC that would rely on MergerFS. I'd also like to add : I think it would be great if the RFC could express more clearly if this feature could be used for multiple layered stores. I know this would be a very niche feature, but there are cases where it would be very useful (for integration with the Shufflecake layered plausible-deniability storage system). It would be helpful to know if it would be a supported or not use case. |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/rfcsc-meeting-2024-03-05/40851/1 |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/rfcsc-meeting-2024-03-19/41829/1 |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: |
Since there was no activity for a while we decided to mark this as a draft. Feel free to undraft it any time once activity picks up again. |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/rfcsc-meeting-2024-04-02/42643/1 |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/super-colliding-nix-stores/28462/19 |
Read the discourse comments, not realizing they were 10 months apart, one asking for shepards, one stating implementation progress xD. oops! Congrats on getting this closer to being landed. 🎉 |
Looks like the notes never made it here (my mistake): from Feb 01 Quick sync : local-overlay storePresent: raitobezarius, tomberek, John Ericson, ryantm. Question : Update on where we are? From an RFC perspective, it seems like there's stalling, but there have been progress and we should write that down.
|
UpdateNixOS/nix#8397 is now merged into Nix master branch. |
There's mentions both in this RFC and in discourse that the Can I get a clarification on this: does this mean if user runs And the garbage collection here only applies to cases when the store objects is somehow deleted from the lower store prior to mounting? And in that case, would running a |
Add a new
local-overlay
store implementation to Nix. This will be a local store that is layered upon another local filesystem store (local store or daemon). This allows locally extending a shared store that is periodically updated with additional store objects.This work is sponsored by Replit ✨