Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation is lacking service setup & config steps, config example is ignored by service #294

Open
FalconFour opened this issue Nov 6, 2024 · 4 comments

Comments

@FalconFour
Copy link

I'm just getting started using bees as I work on setting up a btrfs array for my data at home. Mostly experimenting - cautiously. All new stuff to me, as I'm just learning about the capabilities of btrfs, compression, deduplication, etc.

Setting up bees was a very frustrating experience with a huge hole in the documentation: it says how to build it, and it talks about config options, but there's a huge gap in "how to install it", basically. It leaves me hanging at "good luck, you've installed it, good luck figuring out how to invoke it!"

By reading the code of the scripts provided (I shouldn't have had to do that...), I traced-out that it had installed itself as a systemd service under /usr/lib/systemd/system/[email protected], an instanced service - and reading the code further, it looks like it wants a volume UUID (not a mount point or what was configured in the sample conf file).

So I started the service that way, and waited for the initial crawl.
And waited.
And waited.
I gave it a full day, full CPU, full disk I/O, sloooooooooooowwwwly churning through every byte of readable data... spamming journald with dedupe blocks... sloooooooowwwlly....
And at the end of the night, flummoxed that it was still churning, I said "it'll probably be done in an hour", set it to work overnight, went to sleep...
Woke up, IT WAS STILL CHURNING.

Early yesterday, I had realized it was placing its block map on the slow, spinning-disk bulk data array - and I decided to move it onto the main system SSD. That should rocket-speed its indexing. So I did, I moved /mnt/btrfs_pool/.beeshome over into /etc/bees and edited /etc/bees/beesd.conf to match, then restarted beesd@(uuid).service.

It turns out, no. It created an all-new block index again, completely detached from the original - despite my config file. It turns out that beesd.conf is completely ignored. Now I don't even know what size it's using.

All of this confusion arises from a complete lack of documentation about how to use "the provided beesd scripts" or its attached service. Docs completely gloss-over that part, and Googling different things comes up blank.

So I figured it'd be apt to point this out and, from the unreal amount of pain I've experienced crashing my way through this, hopefully push the docs to fill in a gap around how to use the beesd scripts and service. I now know, by reading this line, that beesd expects to find an instanced config file somewhere that ChatGPT will hopefully be able to help me decode:

FILE_CONFIG="$(grep -E -l '^[^#]*UUID\s*=\s*"?'"$UUID" "$CONFIG_DIR"/*.conf | head -1)"
[ ! -f "$FILE_CONFIG" ] && ERRO "No config for $UUID"
INFO "Find $UUID in $FILE_CONFIG, use as conf"
source "$FILE_CONFIG"

So far that's the best documentation I can find. 😵‍💫

@kakra
Copy link
Contributor

kakra commented Nov 6, 2024

Probably true to some extent: The "config documentation" is mostly a developer point of view usage instruction. It clearly lacks examples. Also, most other documentation whose titles indicate how to use it, mostly discuss internal technical details.

That said, historically there wasn't a beesd script to start the service, so to know and understand how to setup the service, you can look here:
https://github.com/Zygo/bees/blob/master/docs/running.md

These steps should help you better understand what beesd does. Additionally, the systemd service just uses the beesd helper script. The best way to know how it should be used, is probably looking at the commits which introduced it.

The script expects an arbitrarily named config file in /etc/bees. There should be an example file which you can just copy to a custom name.

Next, the script will grep for your btrfs UUID to find a matching config file. It will then use that config file to setup the beeshome directory and a temporary mount point to mount subvolid=0 of your btrfs.

The index file doesn't have a lot of IO activity, it is mapped and locked into RAM anyways (well, sort of). The only activity is the initial load of the file, and then there's a low-priority thread running which does a slow write-back. You really don't need to care about this a lot.

Also, you may have a misconception of bees being done after some time. That's not how it works. It will constantly scan for new data, and your file system probably has some write activity going on throughout the day.

If it doesn't seem to make progress, it's more likely that you're starting with a lot of existing snapshots already. Maybe post your beescrawl.dat contents so we can find a way of skipping most of the snapshots (because those are most likely to be deduplicated already).

About beeshome: It must exist as a standalone snapshot if you're using it on btrfs. But you can place it on any filesystem if you want to. Please keep in mind that the systemd service will restrict write access to certain locations of your system.

Before trying the systemd service, try running beesd from command line to see if it works and shows no errors or warning during initialisation. The systemd service has been added by me only later after beesd has been added. Meanwhile, I'm using my custom, single-instance systemd service which bypasses beesd, maybe it'll help you getting started:

# /etc/systemd/system/bees.service
[Unit]
Description=Bees
Documentation=https://github.com/Zygo/bees
After=local-fs.target
RequiresMountsFor=/mnt/btrfs-pool

[Service]
Type=simple
Environment=BEESSTATUS=%t/bees/bees.status
ExecStart=/usr/libexec/bees --no-timestamps --strip-paths --thread-count=6 --scan-mode=3 --verbose=5 --loadavg-target=5 /mnt/btrfs-pool
#CPUAccounting=true
#CPUWeight=12
CPUSchedulingPolicy=idle
IOSchedulingClass=idle
IOSchedulingPriority=7
KillMode=control-group
KillSignal=SIGTERM
#MemoryAccounting=true
Nice=19
Restart=on-abnormal
ReadWritePaths=/mnt/btrfs-pool
RuntimeDirectory=bees
StartupCPUWeight=25
WorkingDirectory=/run/bees

# Hide other users' process in /proc/
ProtectProc=invisible

# Mount / as read-only
ProtectSystem=strict

# Forbidden access to /home, /root and /run/user
ProtectHome=true

# Mount tmpfs on /tmp/ and /var/tmp/.
# Cannot mount at /run/ or /var/run/ for they are used by systemd.
PrivateTmp=true

# Disable network access
PrivateNetwork=true

# Use private IPC namespace, utc namespace
PrivateIPC=true
ProtectHostname=true

# Disable write access to kernel variables throug /proc
ProtectKernelTunables=true

# Disable access to control groups
ProtectControlGroups=true

# Set capabilities of the new program
# The first three are required for accessing any file on the mounted filesystem.
# The last one is required for mounting the filesystem.
AmbientCapabilities=CAP_DAC_OVERRIDE CAP_DAC_READ_SEARCH CAP_FOWNER CAP_SYS_ADMIN

# With NoNewPrivileges, running sudo cannot gain any new privilege
NoNewPrivileges=true

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/bees.service.d/override.conf
[Unit]
RequiresMountsFor=/mnt/xfs-storage

[Service]
Slice=maintenance.slice
CPUWeight=idle
CPUSchedulingPolicy=batch
IOWeight=10
StartupIOWeight=25
MemoryLow=2G
Environment=BEESHOME=/mnt/xfs-storage/beeshome
ReadWritePaths=/mnt/xfs-storage/beeshome

@FalconFour
Copy link
Author

Yep, that was my issue - I didn't know the config file needed to be named with the UUID inside. I just named it "beesd.conf" (since it was named beesd.conf.sample) and ran it that way. Turns out, it was totally ignoring my config file.

The "taking a long time" was watching "journalctl -f" and seeing a never-ending flood of finding and replacing duplicate segments of files. Frankly, I'd prefer if it only matched on larger chunks - say, 1MB or more - instead of fragmenting files all to heck just because it found a tiny 128K duplicate segment among the data. I'm sure that can't be doing any favors to performance - let alone file system reliability (the more complex the file fragments, the more work they take to access, modify, copy, etc).

That initial "crawl" had taken nearly 24 full hours over about 1.7TB of data. No snapshots, and it never seems to have touched the same file twice. It was just... very slowly... crawling all over my VM disk images, talking about how it found segments here and there, spattered with a constant stream of encountering exceptions (toxic hashes, over and over), deciding not to split something, etc. Full CPU roar and disk I/O constantly.

It did finally conclude, after I stopped it, moved the home folder to the local SSD, tweaked the beesd script (based on the proposal here: #272) to avoid a complaint about not being on a subvolume, aaaand... it flew through the rest of the data and completed in just another hour.

By "completed" I mean it finally finished its initial crawl of existing blocks. I honestly have no idea how integral (or performant) my data still is at this point, but I'll be checking that again soon, and will leave bees running while I continue writing new data to it. Hopefully it all works out well :)

Overall, I just wish there was a better "getting started" guide at the initial jumping-in point. The only major challenge I had was understanding where the config is located and how to start the service. Even just knowing the config file needed to be named /etc/bees/beesd.(UUID).conf and the invocation is systemctl start beesd@(UUID).service - that would have launched me on the right path right away :) Addressing issue #272 would be icing on the cake to make the initial indexing much faster as well! (Even if it's a low-priority write, even occasional - on spinning-rust disks, it's an I/O nightmare!)

@kakra
Copy link
Contributor

kakra commented Nov 7, 2024

I don't think the config file name matters at all, it just needs to have the UUID properly set as a variable inside the config file. But if you installed via a distribution package, this may behave slightly different. In that case, it would be up to your distribution to provide the proper documentation.

But yes, I think we could need a better "getting started" guide with well explained examples.

If you'd like to make a doc PR, @Zygo will probably happily take it.

For performance: VM images on btrfs are bad anyways. If you use raw images with cow disabled, bees will probably ignore them anyways. And if you don't use cow, chances are high that they are split into millions of tiny extents already even before bees. Btrfs will have a hard time accessing such files with good performance. If you're using a smaller hash data size for bees, it will prefer bigger chunks of deduplication once the hash index is filled.

Personally, I'm running btrfs raid-1 on 4 bcache volumes for data backed by 4x spinning rust as backend storage and 2x NVMe partitions (as mdraid1) for caching, and 2 NVMe partitions for meta data. This performs very well, especially moving btrfs meta data to fast storage is a huge performance gain.

To split data and meta data to different types of disks, you need a kernel patch: kakra/linux#31

This patch is completely backwards-compatible to kernels without that patch (but writing to the disk pool then will ignore the meta data hint and just allocate meta data evenly across all disks). So it's safe to use.

@kakra
Copy link
Contributor

kakra commented Dec 4, 2024

Is it correct behavior that the table never updates the transid until it reached 100% in that category

Sorry for the noise, posted to the wrong thread...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants