Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call for testers - release candidate v0.11-rc3 #300

Open
Zygo opened this issue Jan 19, 2025 · 33 comments
Open

Call for testers - release candidate v0.11-rc3 #300

Zygo opened this issue Jan 19, 2025 · 33 comments

Comments

@Zygo
Copy link
Owner

Zygo commented Jan 19, 2025

Continued from #296. Thanks to all who tested -rc1 and -rc2!

New since -rc2:

  • Support for openat2 on kernels that have it. This uses weak symbols for the first time. It may cause building or running issues for anyone building or running with non-mainstream libc or toolchain.
  • Scheduler improvements to update progress more consistently and shorten queue times.
  • Lots of doc updates. The minimum supported kernel has been moved to v5.4, which means the docs don't need to discuss ancient kernel bugs.

These changes might cause a regression relative to -rc2. If there are any, I'll back out the problematic changes, and v0.11 will be based on -rc2 or -rc3 with the offending commits reverted.

Todo:

@manxorist
Copy link

Building v0.11-rc3 fails for me with

[...]
g++ -Wall -Wextra -Werror -O3 -I../include -D_FILE_OFFSET_BITS=64 -std=c++11 -Wold-style-cast -Wno-missing-field-initializers  -M -MF openat2.dep -MT openat2.o openat2.cc
In file included from openat2.cc:1:
../include/crucible/openat2.h:4:10: fatal error: linux/openat2.h: No such file or directory
    4 | #include <linux/openat2.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
make[1]: *** [Makefile:36: openat2.dep] Error 1
[...]

on Ubuntu 20.04.6 LTS.

As far as I understand it, Ubuntu 20.04 only provides Linux kernel userspace headers for kernel 5.4 (package linux-libc-dev version 5.4.0-204.224), even when running HWE kernel 5.15 (package linux-image-5.15.0-129-lowlatency version 5.15.0-129.139~20.04.1).

bees v0.11-rc2 builds and works fine.

I do not have strong feelings about Ubuntu 20.04 support, so if you decide to not support it any more, that would be fine by me.

@XZVB12
Copy link

XZVB12 commented Jan 19, 2025

Hello, thank you for your work.
Checked on the kernel 6.12.10 for me works fine, but I did not check it on the main system, only on external drives, because I don’t want to be stuck with this problem: #249
I think for convenience it would be a good idea to write the remaining time statistics on the console, for example every 5 minutes / % and complete the process when everything is done or at least write about it, now the application just stays open.
There was a suggestion to create a convenient tool for status viewing #193, but it seems the idea was thrown.

I read a lot of comments about simultaneous compression and dedup and seem confused, can please you clarify the following :
it says here, #70 (comment) that any defragmentation cancels the bees work. will it affect the autodefrag flag mounting ? or only on manual start of full defragmentation: btrfs filesystem defragment -r -v -czstd ?
The same question applies to compression. Does compression compress=zstd:10 cancel deduplication or deduplication cancel compression?
As I understand from this message: #296 (comment) It is better to completely abandon compression , but this message #185 (comment) says that using compression and deduplication at the same time is a good idea.

Sorry to ask here, some themes have not been updated for a long time and I didn’t want to create a new theme that duplicated the old one.

@Zygo
Copy link
Owner Author

Zygo commented Jan 20, 2025

There was a suggestion to create a convenient tool for status viewing #193, but it seems the idea was thrown.

There's something like that on roadmap, but it doesn't make bees run any faster, so it's a low priority (or an opportunity for external contribution--patches welcome).

any defragmentation cancels the bees work

btrfs fi defrag and autodefrag both work by copying the data to a new physical location. The coalescing of small extents into larger ones is a side-effect of the copying process. All dedupe is the opposite of that process, replacing separate copies with references to a single physical location. Dedupe requires fragment boundaries to align exactly to boundaries between duplicate and unique data, while defrag ignores such boundaries, requiring correction from bees when the duplicate copies are deduped again.

Any form of defragmentation looks like new data to bees, which will attempt to dedupe the new data, possibly reinstating some of its earlier fragmentation. In v0.11, bees will now only introduce fragments when most of the space used by the fragmented extent can be freed as a result of the fragmentation. This produces a healthier balance between defrag and dedupe.

The same question applies to compression. #296 (comment)

Compression and dedupe are independent features, and they can be combined to get the good effects of both.

Note that in the graph, the compressed and uncompressed data sets are both 45% smaller after dedupe, but they were different sizes at the beginning. The compressed data set went from 100 GiB to 55 GiB, while the uncompressed data set went from 160 GiB to 90 GiB. The effect of dedupe was the same, but the compressed data was smaller due to compression, so fewer bytes were saved by dedupe because they were already saved by compression.

Combining compression and dedupe, the data went from 160 GiB (uncompressed without dedupe) to 55 GiB (compressed with dedupe), a total savings of 65%.

bees won't recompress data for you. You'll have to do that yourself, with btrfs fi defrag -vrczstd or similar.

@Zygo
Copy link
Owner Author

Zygo commented Jan 20, 2025

[...Ubuntu 20, kernel 5.15, kernel headers 5.4...]

It turned out to be not too painful to add compatibility code for older headers, similar to what bees already does for btrfs, so I pushed a fix to master.

If anyone wants to test it on their DEC Alpha...let me know if it works. 😉

@KeinNiemand
Copy link

In #296 you said
"Only one thread reads from disk at any time".
Would it make sense to use 2 threads to read from disk on raid 1 and raid10 filesystems (3 for raid1c3, 4 for raid1c4)? I don't know exactly how btrfs handles reading from raid1 but I recall reading somewhere that it splits different reads between the disks so maybe using 2 threads (1 per copy of the data in the raid) would be faster.

@manxorist
Copy link

[...Ubuntu 20, kernel 5.15, kernel headers 5.4...]

It turned out to be not too painful to add compatibility code for older headers, similar to what bees already does for btrfs, so I pushed a fix to master.

not quite yet ...

[...]
g++ -Wall -Wextra -Werror -O3 -I../include -D_FILE_OFFSET_BITS=64 -std=c++11 -Wold-style-cast -Wno-missing-field-initializers  -M -MF openat2.dep -MT openat2.o openat2.cc
In file included from openat2.cc:1:
../include/crucible/openat2.h:22:2: error: ‘__u64’ does not name a type
   22 |  __u64 flags;
      |  ^~~~~
../include/crucible/openat2.h:23:2: error: ‘__u64’ does not name a type
   23 |  __u64 mode;
      |  ^~~~~
../include/crucible/openat2.h:24:2: error: ‘__u64’ does not name a type
   24 |  __u64 resolve;
      |  ^~~~~
make[1]: *** [Makefile:41: openat2.o] Error 1
[...]

@kakra
Copy link
Contributor

kakra commented Jan 20, 2025

bees won't recompress data for you

Doesn't bees create compressed temporary files to place dedupe extents?

@kakra
Copy link
Contributor

kakra commented Jan 20, 2025

Would it make sense to use 2 threads to read from disk

Not necessarily... btrfs stripes reads by PID modulo. So if you have two disk candidates to read from, btrfs does PID modulo 2, and the result selects stripe 0 or 1 for read.

So if bees happens to read only from "even" PIDs, two threads would read the same stripe - always - because bees threads are relatively long living.

I wonder if we could get some stripe selector into the kernel at some time which decides reading based on IO latency. But this is difficult to do, because switching the stripe too often will reduce throughput. And the gains seem not to be very high: There has been a patchset in the past trying to do something like that, and it didn't improve a lot.

So if bees could read from two stripes in parallel, it would probably not gain a lot.

@KeinNiemand
Copy link

KeinNiemand commented Jan 20, 2025

Would it make sense to use 2 threads to read from disk

Not necessarily... btrfs stripes reads by PID modulo. So if you have two disk candidates to read from, btrfs does PID modulo 2, and the result selects stripe 0 or 1 for read.

So if bees happens to read only from "even" PIDs, two threads would read the same stripe - always - because bees threads are relatively long living.

Could bees account for the PID modulo 2 thing? Like just use 2 threads for reading but make sure that one has an odd PID and one an even. If PID can't be controlled when running threads, getting and even and an odd still shouldn't be to hard, just create threads until you have at least one even and one odd PID then get rid of the extra threads you made.
I assume the same logic apply with module 3 for raid1c3 and module 4 for raid1c4.

I wonder if we could get some stripe selector into the kernel at some time which decides reading based on IO latency. But this is difficult to do, because switching the stripe too often will reduce throughput. And the gains seem not to be very high: There has been a patchset in the past trying to do something like that, and it didn't improve a lot.

If switching is bad how about just assigning each thread a stripe to read from but instead of making it based on PID base in on IO latency or throughput or utilization of each stripe or whatever trying to balance reads between stripes, switching could be done very infrequently maybe once per minute or less.

So if bees could read from two stripes in parallel, it would probably not gain a lot.

If you account for the PID thing you'd have 2 independent threads doing reads without switching, why would it not gain a lot? In theory reading from 2 stripes should be an up to 100% improvement in read speed.

@kakra
Copy link
Contributor

kakra commented Jan 20, 2025

If you account for the PID thing you'd have 2 independent threads doing reads without switching, why would it not gain a lot? In theory reading from 2 stripes should be an up to 100% improvement in read speed.

Because there is still meta data which may not follow the same logic, and because btrfs is not really good at parallelizing reads and writes from the same consumer.

Most of the time in bees seems to be spend on very short extents which involves a lot of metadata IO. So while reads of large extents could benefit from such a logic, it would be only a very small gain in the overall extent processing. I'd rather leave that extra "free" IO to other processes.

But I'm pretty sure bees did multiple read threads in the past... @Zygo? So this feature must have been dropped for a reason.

@KeinNiemand
Copy link

KeinNiemand commented Jan 20, 2025

If you account for the PID thing you'd have 2 independent threads doing reads without switching, why would it not gain a lot? In theory reading from 2 stripes should be an up to 100% improvement in read speed.

Because there is still meta data which may not follow the same logic, and because btrfs is not really good at parallelizing reads and writes from the same consumer.

How is two bees thread doing reads any different then two completely independent consumers? As long as the two threads work on scanning different data (don't try to read the same file/extent/whatever bees works with 2 threads) they should not really interfere with each other. How is bees any different then two compactly separate applications reading 2 different things from the 2 copies?

Most of the time in bees seems to be spend on very short extents which involves a lot of metadata IO. So while reads of large extents could benefit from such a logic, it would be only a very small gain in the overall extent processing. I'd rather leave that extra "free" IO to other processes.

Is there a reason metadata reads wouldn't follow the same logic assuming it's also raid1 or higher (I guess if data and metadata are at different raid levels they would use different module values) for me data is raid10 and metadata is raid1c3.

But I'm pretty sure bees did multiple read threads in the past... @Zygo? So this feature must have been dropped for a reason.

My guess (I have no idea what bees actually did in previous and I'm just going of one part of last issue saying it now only uses one thread for reading.) is that previously bees just didn't have a limit on how many threads could read at the same time, which probably degraded performance/gave no performance gains since you had multiple threads reading from the same disks.
Trying to use on thread per copy on a raid (accounting for the PID thing) might improve performance.

@Zygo
Copy link
Owner Author

Zygo commented Jan 20, 2025

But I'm pretty sure bees did multiple read threads in the past... @Zygo? So this feature must have been dropped for a reason.

It wasn't a feature. Old versions simply hurled IO requests at the kernel, and the kernel generally scheduled them in the least efficient way possible. There were readahead requests to try to streamline that by prefetching data, but it turns out that's not what readahead is designed to handle. Kernel IO schedulers promised a lot, before and after the mq transition, but failed to deliver anything useful to bees.

As a temporary fix for v0.11, all readahead/prefetching was moved to userspace, and a mutex was introduced to ensure that only one thread at a time does prefetch (the downstream processing is still parallel, but it's all working from page cache after prefetch is done). Extent scan ensures that IO patterns generated by bees reused cached pages more often, which gains back most of the performance that subvol scans were losing by design.

It's a temporary fix because the more permanent fix will be to remove as much block reading as possible. There's no need to make that code run faster or in parallel, if we can avoid running it in the first place. Getting rid of reads is next on the todo list, but we want to get the non-trivial gains we have so far into a stable release so distros will pick it up, and people can stop running v0.10 and earlier.

It doesn't make a lot of sense to micro-optimize reads when:

  1. Reads are mostly going away. The goal is to eventually read less than 1% of the unique uncompressed data by reading the csum tree instead of data blocks to find duplicates. Once that's done, even if read overhead was magically reduced all the way to zero by micro-optimization, it would only result in a 1% performance improvement in the scanning phase.
  2. Reads don't dominate the IO time any more. The temporary fix was good enough that now the IO overhead is a roughly even mix of reads, LOGICAL_INO, and dedupe operations. If we get rid of most reads, we don't need most of the LOGICAL_INO calls either--in most cases, we're doing LOGICAL_INO only so we can read data blocks from files.
  3. If the kernel ever gets around to merging any of the existing read policy patches (some of which are more than 5 years old now), the btrfs raid1 mirror optimization won't work any more, and will make reads slower when attempted. In that event, any prior effort spent on optimizing bees for that case would be wasted--users could upgrade their kernels and delete a few lines of bees code, and get the optimization for free.
  4. The reads have to go somewhere. On SSDs, where reads could be issued in parallel, the improvement is limited by all the CPU bees uses after reading the data. Improvement on SSD would require optimizing other parts of bees to be able to consume the data at speeds over a few hundred MiB/s. Most of those parts will be replaced as part of the transition to csum scan, because they're slow and limiting performance even more than single-threaded reads do.

I made some prototypes where bees figures out where we can send reads in parallel based on the chunk tree, and schedule tasks to avoid reading from the same devices at the same times, using the same framework bees currently uses to avoid inode and extent lock contention. They suffer from two problems:

  1. Only a subset of btrfs configurations can use the optimization. Any single-device filesystem, or any filesystem using raid0, raid10, raid5, or raid6, can't usefully do parallel reads from userspace as they are already parallelized in the kernel. On raid1, raid1c3, or raid1c4, the raid1 optimization relies on not having a non-default read policy enabled. That leaves only multi-disk single profile filesystems, and raid1 filesystems with 4 or more devices, raid1c3 with 6 or more devices, or raid1c4 with 8 or more devices, where the micro-optimization can be enabled. Those configurations seem unusual[citation needed].

  2. When it's enabled, it almost never works without restructuring other parts of bees or making work queues insanely large. e.g. to get two dedupe reads to run in parallel, we need to find 2 duplicate data extents that are stored on 4 separate drives. The current extent scanner pulls all of one side of the dedupe from the same block group, which means we almost never get more than 3 drives involved concurrently in 2 parallel dedupes. Even if that is fixed, duplicates are found where duplicates are found, so the optimizer is at the mercy of the user's data to find opportunities to work.

Some of these problems could be solved, but the csum tree scanner simply doesn't have the problems to begin with, and it's hundreds of times faster on all storage configurations. After that's done, we can look at where bees spends its time and focus effort there.

Another possible micro-optimization is to use the CLONE_IO flag for bees threads, so they appear to be separate processes to the kernel. I say "possible" because it would involve reimplementing pthread_clone just to attempt it, and it's not clear there's any benefit compared to having the kernel treat bees as multiple processes for IO vs. having bees organize the read workload itself as a single process. On the todo list, CLONE_IO experiments are last, in a bucket with "using io_uring" and other tweaks that are also likely to be irrelevant in the long term.

@Zygo
Copy link
Owner Author

Zygo commented Jan 20, 2025

bees won't recompress data for you

Doesn't bees create compressed temporary files to place dedupe extents?

Only for partially duplicate extents where it can't find a duplicate copy of some of the data, and only when those fragments are compressible.

In the 160 GiB uncompressed test, only 293 MiB was compressed by bees (and about 7 GiB was zeros, so those are now holes that don't count as space usage).

Processed 151 files, 55179 regular extents (310127 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       99%       90G          91G         153G
none       100%       90G          90G         151G
zstd        30%      293M         948M         2.2G

@KeinNiemand
Copy link

KeinNiemand commented Jan 21, 2025

  1. Only a subset of btrfs configurations can use the optimization. Any single-device filesystem, or any filesystem using raid0, raid10, raid5, or raid6, can't usefully do parallel reads from userspace as they are already parallelized in the kernel. On raid1, raid1c3, or raid1c4, the raid1 optimization relies on not having a non-default read policy enabled. That leaves only multi-disk single profile filesystems, and raid1 filesystems with 4 or more devices, raid1c3 with 6 or more devices, or raid1c4 with 8 or more devices, where the micro-optimization can be enabled. Those configurations seem unusual[citation needed].

Why wouldn't raid10 benefit? Yes it already does parallel reads from 2 devices because of the 2 stripes but there's also another copy like in raid1. Wouldn't the raid1 optimization also work on raid10 (going from reading from 2 devices to all 4 in a 4 drive raid 10 setup)?
I might be confusing raid10 with raid0 +1 here I don't know in what order btrfs does raid10 in.

@kakra
Copy link
Contributor

kakra commented Jan 21, 2025

Yes it already does parallel reads from 2 devices because of the 2 stripes

That's not how striping works. If your IO request is large enough to cover multiple stripes, yes, they can be read in parallel (increased throughput). If your IO request is smaller than the stripe width, then nothing is read in parallel and you rather gain performance from multithreaded workloads (reduced latency). That's why you want smaller stripes for databases and larger stripes for file servers.

In raid1, the system has multiple (usually 2) independent IO queues to work with, and some latency-depending IO algorithm could distribute the load to those two queues. But care must be taken to keep logically sequential application access in the same queue otherwise you'll reduce overall performance by disturbing other independent workloads.

For read workloads, raid0 and raid1 are mostly identical. For write workloads, they are obviously not. For raid5 workloads, you get a less predictable mixture of both, where writes are limited by the slowest member (like in raid1), and reads are limited by (n-1) IO queues. raid5 may do diagonal striping to better distribute competing sequential workloads so the parity stripe does not span a single disk only.

bees has no such predictable workloads. And raid does not parallelize anything at all. By design.

@kakra
Copy link
Contributor

kakra commented Jan 21, 2025

With beeshash.dat on XFS I get:

Jan 21 14:06:58 vch01 bees[483274]: bees[483274]: *** EXCEPTION ***
Jan 21 14:06:58 vch01 bees[483274]: bees[483274]:         exception type std::system_error: BTRFS_IOC_INO_LOOKUP: /var/lib/bees/8ff943b5-fbd2-4ea8-ad47-404a5961112c/beeshash.dat at fs.cc:488: Inappropriate ioctl for device
Jan 21 14:06:58 vch01 bees[483274]: bees[483274]: ***

It probably has been that way since a long time but due to the new silenced logger, I've increased verbosity from 5 to 6.

And maybe we should ship bees with verbosity 7 by default instead of 8?

@Zygo
Copy link
Owner Author

Zygo commented Jan 21, 2025

It turned out to be not too painful to add compatibility code for older headers, similar to what bees already does for btrfs, so I pushed a fix to master.

not quite yet ...

...sigh. Try now?

@manxorist
Copy link

It turned out to be not too painful to add compatibility code for older headers, similar to what bees already does for btrfs, so I pushed a fix to master.

not quite yet ...

...sigh. Try now?

Yes, thanks. Running master at commit 85aba7b since ~9 hours already.

@Zygo
Copy link
Owner Author

Zygo commented Jan 21, 2025

  1. Only a subset of btrfs configurations can use the optimization. Any single-device filesystem, or any filesystem using raid0, raid10, raid5, or raid6, can't usefully do parallel reads from userspace as they are already parallelized in the kernel. On raid1, raid1c3, or raid1c4, the raid1 optimization relies on not having a non-default read policy enabled. That leaves only multi-disk single profile filesystems, and raid1 filesystems with 4 or more devices, raid1c3 with 6 or more devices, or raid1c4 with 8 or more devices, where the micro-optimization can be enabled. Those configurations seem unusual[citation needed].

Why wouldn't raid10 benefit? Yes it already does parallel reads from 2 devices because of the 2 stripes but there's also another copy like in raid1. Wouldn't the raid1 optimization also work on raid10 (going from reading from 2 devices to all 4 in a 4 drive raid 10 setup)? I might be confusing raid10 with raid0 +1 here I don't know in what order btrfs does raid10 in.

The quoted paragraph assumes the kernel is already parallelizing raid1 reads (or at least preventing userspace from predicting device usage) with one of the non-default read_policy settings.

All the raid1* profiles in btrfs use the same mirror preference choice, so with default read_policy it would be possible to get two non-interfering readers on raid10; however, only two would be possible in the entire filesystem, regardless of the number of drives, due to the striping. (edit: it also relies on btrfs always assigning a device to the same position in the raid10 stripes, which is not guaranteed to happen when there's an even number of drives, and guaranteed not to happen when there's an odd number.) It won't scale up with the number of drives as the other raid1* and single profiles do.

Either way, the gain is far too small to be worth pursuing yet.

@Zygo
Copy link
Owner Author

Zygo commented Jan 21, 2025

exception type std::system_error: BTRFS_IOC_INO_LOOKUP: /var/lib/bees/8ff943b5-fbd2-4ea8-ad47-404a5961112c/beeshash.dat at fs.cc:488: Inappropriate ioctl for device

Yeah, that error should be captured and ignored. It's trying to find beeshash.dat's subvol and inode numbers so bees can recognize its own hash table and avoid deduping it. If it's not even on the same filesystem, then nothing needs to be done to exclude it.

While we're supporting XFS for $BEESHOME, I should put the fsyncs back in. Kernel 5.16 finally fixed the bug that the fsyncs were removed to work around, and XFS is very broken without fsync.

And maybe we should ship bees with verbosity 7 by default instead of 8?

...and maybe rearrange some of the log messages. Exception trace logs maybe should be at debug level or one level above, as most of those are expected events where we just have to stop big complex operation because the data changed under us while we were working on it. Only actual IO errors should be visible at lower verbosity levels. We don't use log level 1, 2, or 4 at all.

@KeinNiemand
Copy link

I'm just wondering why bees is talking this long to finish it's initial run for me for this new version.
I'm pretty sure the old version managed complete the initial run in a few weeks, after that I added new drives (and ran a full balance/deleted the hash table), ran the old version for a bit (not long enough for it to finish) then switched to the new version which has been running for over a month now with no sings of being anywhere near done.
The ETA just keep going up right now the size class closed to finishing will take till April and every time I look at it the ETA mover to later and later dates, I know the ETA very inaccurate and it could just go up and up and then go down rapidly when it hits some point but I still wondering why it's seemingly so slow for me.

@kakra
Copy link
Contributor

kakra commented Jan 22, 2025

then switched to the new version which has been running for over a month now with no sings of being anywhere near done.

Has been the same for me, and then suddenly it finished. Don't look at the ETA numbers, they are vastly inaccurate. The new bees is actually a lot faster. I've done the same and purged the hash and state, and let the new bees start over.

And coming back to the raid argument: No that won't help here. Your problem is not throughput, it's how the smaller extents are laid out across your file system. Bees probably spends most time in kernel calls where the kernel is slow due to how btrfs works.

Selectively defragmenting those files which are made of thousands and millions of extents may speed this process up considerably - given that those files are not part of any snapshots. For me, such files are usually log files, the $HOME/.cache directory and VM images, probably also Steam shader caches (if you use that) and SQLite files without write-ahead-logging (WAL: https://www.sqlite.org/wal.html, paragraph 3 on how to convert them), which a lot of apps use (especially browsers). I converted all those to WAL mode and defragged them.

Actually, I've used bees in the past, and current version, to spot the files and directories which take a lot time to process.

When bees reached the end of the smaller size classes, it's probably way behind the current generation number and will spawn another round of extra long ETA - and then just finish after 1-2 weeks or faster.

@Zygo
Copy link
Owner Author

Zygo commented Jan 22, 2025

Can confirm the ETA function is terrible. I had a 35-week ETA for 128K extents abruptly finish a few days ago.

When I examined the logs in detail, I found that age of the extents is part of the problem: after a scan has been running for a while, more and more of the filesystem is above the top of the transid range (i.e. the data is newer than the scan). When the ETA is calculated, it's based on proportions of the total space, but it doesn't take time into account. The ETA assumes all the data will be processed in the current scan, when in reality the new data will be skipped in the current scan and become part of the next scan.

I don't see how to use the above information to improve the ETA (any stats wizards out there?). If we try to exclude data that's outside of the transid range, then all the size estimations will be based on the most recently written data in later scans. When bees is up to date and processing new writes as they happen, these samples are very small with huge variation, so the ETAs computed from that data are junk.

@Zygo
Copy link
Owner Author

Zygo commented Jan 22, 2025

I'm just wondering why bees is talking this long to finish it's initial run for me for this new version.

Can you post your progress table and your kernel version? The "RATES" section of the status file might also be informative.

I don't want to delay v0.11 chasing down every possible performance improvement--that's what later releases are for--but I also want to make sure we haven't got an unanticipated problem.

@KeinNiemand
Copy link

KeinNiemand commented Jan 22, 2025

I've actualy saved multiple copies of my status files so here they are, I post the full status file in case anything in the could be useful. My Kernel version is 6.8.0-51-generic
This is my status File from today

TOTAL:
        addr_block=388221338 addr_compressed_offset=5226979 addr_eof_e=58811 addr_from_fd=89843991 addr_from_root_fd=2854317 addr_hole=20704 addr_magic=20704 addr_ms=81353 addr_uncompressed=382994359
        adjust_compressed_offset_correct=56060 adjust_eof_haystack=183 adjust_eof_hit=183 adjust_eof_needle=183 adjust_exact=2509912 adjust_exact_correct=2319949 adjust_hit=2376009 adjust_miss=189966 adjust_offset_high=252599 adjust_offset_hit=56063 adjust_offset_low=35559 adjust_try=2854316
        block_bytes=1514566796340 block_hash=323935005 block_ms=53181185 block_read=369799410 block_zero=125264
        chase_corrected=12043 chase_hit=2376192 chase_no_data=478124 chase_try=2854317 chase_uncorrected=2364149 chase_wrong_addr=1
        crawl_create=456428 crawl_discard=2049775 crawl_extent=265428 crawl_gen_high=25846 crawl_throttled=59 crawl_unthrottled=58
        dedup_bytes=50697972158 dedup_copy=13614742 dedup_hit=259415 dedup_ms=7052720 dedup_try=259415
        extent_deferred_inode=19153863 extent_forward=12646268 extent_mapped=265161 extent_ms=113232833 extent_ok=265428 extent_overflow=267 extent_ref_ok=13168189
        hash_already=61 hash_bump=47 hash_erase=49067 hash_evict=1026690 hash_extent_in=65536 hash_extent_out=191857 hash_front=5589 hash_front_already=249786 hash_insert=1034335 hash_lookup=298912203
        open_clear=1809 open_file=4014409 open_hit=4014409 open_ino_ms=29020812 open_lookup_ok=4014409 open_root_ms=94126
        pairbackward_bof_first=1492 pairbackward_hit=3438722 pairbackward_miss=1976509 pairbackward_ms=1392113 pairbackward_overlap=282 pairbackward_same=331 pairbackward_stop=2376192 pairbackward_try=5425265 pairbackward_zero=7929
        pairforward_eof_first=5758 pairforward_eof_malign=14094 pairforward_hit=24727481 pairforward_miss=2082175 pairforward_ms=8627572 pairforward_overlap=919 pairforward_same=20123 pairforward_stop=2376192 pairforward_toxic_hash=378 pairforward_try=26859693 pairforward_zero=8765
        progress_ok=10854
        readahead_bytes=10205399284 readahead_clear=1236 readahead_count=13022 readahead_ms=261001 readahead_skip=17092512 readahead_unread_ms=249
        replacedst_dedup_hit=250821 replacedst_grown=795401 replacedst_try=2401696
        replacesrc_dedup_hit=9368 replacesrc_grown=3983 replacesrc_try=780
        resolve_empty=14178 resolve_fail=6252 resolve_large=24061 resolve_ms=163447587 resolve_ok=315077 resolve_overflow=4 resolve_toxic=25
        root_clear=1809 root_found=183340 root_ok=183340 root_parent_open_ok=183340 root_parent_open_try=183340
        scan_already=11979868 scan_blacklisted=35 scan_block=295540408 scan_compressed_no_dedup=1097 scan_dup_block=12337787 scan_dup_hit=2366824 scan_extent=12657866 scan_forward=12646268 scan_found=360119875 scan_hash_hit=2366824 scan_hash_insert=1031218 scan_hash_miss=34872 scan_hash_preinsert=295437738 scan_hole=1816 scan_lookup=270745622 scan_malign=34 scan_push_front=255350 scan_reinsert=3178 scan_resolve_hit=2401696 scan_resolve_zero=14195 scan_rewrite=780 scan_seen_clear=2 scan_seen_erase=240525 scan_seen_hit=406195 scan_seen_insert=27902 scan_seen_miss=12249855 scan_skip_bytes=288 scan_skip_ops=3 scan_toxic_hash=1534 scan_toxic_match=25 scan_zero=102670
        scanf_extent=12657865 scanf_extent_ms=240486923 scanf_total=12646232 scanf_total_ms=281068539
        tmp_aligned=780 tmp_block=3178 tmp_block_zero=148 tmp_bytes=13008534 tmp_copy=780 tmp_copy_ms=80 tmp_create=1 tmp_resize=1561
RATES:
        addr_block=1982.29 addr_compressed_offset=26.69 addr_eof_e=0.301 addr_from_fd=458.751 addr_from_root_fd=14.575 addr_hole=0.106 addr_magic=0.106 addr_ms=0.416 addr_uncompressed=1955.6
        adjust_compressed_offset_correct=0.287 adjust_eof_haystack=0.001 adjust_eof_hit=0.001 adjust_eof_needle=0.001 adjust_exact=12.816 adjust_exact_correct=11.846 adjust_hit=12.133 adjust_miss=0.97 adjust_offset_high=1.29 adjust_offset_hit=0.287 adjust_offset_low=0.182 adjust_try=14.575
        block_bytes=7.7335e+06 block_hash=1654.04 block_ms=271.548 block_read=1888.23 block_zero=0.64
        chase_corrected=0.062 chase_hit=12.134 chase_no_data=2.442 chase_try=14.575 chase_uncorrected=12.072 chase_wrong_addr=0.001
        crawl_create=2.331 crawl_discard=10.467 crawl_extent=1.356 crawl_gen_high=0.132 crawl_throttled=0.001 crawl_unthrottled=0.001
        dedup_bytes=258868 dedup_copy=69.518 dedup_hit=1.325 dedup_ms=36.012 dedup_try=1.325
        extent_deferred_inode=97.802 extent_forward=64.573 extent_mapped=1.354 extent_ms=578.176 extent_ok=1.356 extent_overflow=0.002 extent_ref_ok=67.238
        hash_already=0.001 hash_bump=0.001 hash_erase=0.251 hash_evict=5.243 hash_extent_in=0.335 hash_extent_out=0.98 hash_front=0.029 hash_front_already=1.276 hash_insert=5.282 hash_lookup=1526.27
        open_clear=0.01 open_file=20.498 open_hit=20.498 open_ino_ms=148.183 open_lookup_ok=20.498 open_root_ms=0.481
        pairbackward_bof_first=0.008 pairbackward_hit=17.559 pairbackward_miss=10.093 pairbackward_ms=7.109 pairbackward_overlap=0.002 pairbackward_same=0.002 pairbackward_stop=12.134 pairbackward_try=27.702 pairbackward_zero=0.041
        pairforward_eof_first=0.03 pairforward_eof_malign=0.072 pairforward_hit=126.261 pairforward_miss=10.632 pairforward_ms=44.054 pairforward_overlap=0.005 pairforward_same=0.103 pairforward_stop=12.134 pairforward_toxic_hash=0.002 pairforward_try=137.148 pairforward_zero=0.045
        progress_ok=0.056
        readahead_bytes=52109.6 readahead_clear=0.007 readahead_count=0.067 readahead_ms=1.333 readahead_skip=87.276 readahead_unread_ms=0.002
        replacedst_dedup_hit=1.281 replacedst_grown=4.062 replacedst_try=12.264
        replacesrc_dedup_hit=0.048 replacesrc_grown=0.021 replacesrc_try=0.004
        resolve_empty=0.073 resolve_fail=0.032 resolve_large=0.123 resolve_ms=834.577 resolve_ok=1.609 resolve_overflow=0.001 resolve_toxic=0.001
        root_clear=0.01 root_found=0.937 root_ok=0.937 root_parent_open_ok=0.937 root_parent_open_try=0.937
        scan_already=61.171 scan_blacklisted=0.001 scan_block=1509.05 scan_compressed_no_dedup=0.006 scan_dup_block=62.998 scan_dup_hit=12.086 scan_extent=64.633 scan_forward=64.573 scan_found=1838.8 scan_hash_hit=12.086 scan_hash_insert=5.266 scan_hash_miss=0.179 scan_hash_preinsert=1508.53 scan_hole=0.01 scan_lookup=1382.45 scan_malign=0.001 scan_push_front=1.304 scan_reinsert=0.017 scan_resolve_hit=12.264 scan_resolve_zero=0.073 scan_rewrite=0.004 scan_seen_clear=0.001 scan_seen_erase=1.229 scan_seen_hit=2.075 scan_seen_insert=0.143 scan_seen_miss=62.549 scan_skip_bytes=0.002 scan_skip_ops=0.001 scan_toxic_hash=0.008 scan_toxic_match=0.001 scan_zero=0.525
        scanf_extent=64.633 scanf_extent_ms=1227.95 scanf_total=64.573 scanf_total_ms=1435.16
        tmp_aligned=0.004 tmp_block=0.017 tmp_block_zero=0.001 tmp_bytes=66.423 tmp_copy=0.004 tmp_copy_ms=0.001 tmp_create=0.001 tmp_resize=0.008
THREADS (work queue 0 of 9299 tasks, 4 workers, load: current 0 target 0 average 0):
        tid 76659: bees: [195845s] waiting for signals
        tid 76708: ref_1c96ca8f5000_6.445M_161: resolving addr 0x23b622df9000 with LOGICAL_INO
        tid 76710: progress_report: [1444.88s] idle 3600
        tid 76711: status_report: writing status to file '/run/bees//1b745638-8212-45fc-b4ff-4d58dec53bc1.status'
        tid 76712: hash_writeback: flush rate limited after extent #55355 of 65536 extents
        tid 76713: hash_prefetch: [1979.29s] idle 3600s
        tid 76714: crawl_transid: [24.569s] waiting 70.0081s for next transid RateEstimator { count = 678614, raw = 111.388 / 7798.06, ratio = 111.388 / 7822.63, rate = 0.0142392, duration(1) = 70.2287, seconds_for(1) = 45.6602 }
        tid 76715: crawl_writeback: [543.963s] idle, dirty
PROGRESS:
extsz   datasz  point gen_min gen_max this cycle start tm_left   next cycle ETA
----- -------- ------ ------- ------- ---------------- ------- ----------------
  max   7.034T 459568       0  618420 2024-12-11 17:48 13w 21h 2025-04-24 21:32
  32M   1.835T 296911       0  618420 2024-12-11 17:48  20w 2d 2025-06-14 05:39
   8M   1.356T 531017       0  618420 2024-12-11 17:48  11w 2d 2025-04-12 12:49
   2M 354.076G 441725       0  618420 2024-12-11 17:48  13w 4d 2025-04-28 14:37
 512K 232.085G 245421       0  618420 2024-12-11 17:48  24w 4d 2025-07-14 01:47
 128K  19.697G 380418       0  618420 2024-12-11 17:48  15w 6d 2025-05-14 00:22
total  10.816T        gen_now  678614                  updated 2025-01-22 23:16

This is my status file from yesterday:

TOTAL:
        addr_block=237197360 addr_compressed_offset=2173251 addr_eof_e=45830 addr_from_fd=33354334 addr_from_root_fd=687677 addr_hole=10264 addr_magic=10264 addr_ms=45057 addr_uncompressed=235024109
        adjust_compressed_offset_correct=41915 adjust_eof_haystack=61 adjust_eof_hit=61 adjust_eof_needle=61 adjust_exact=457137 adjust_exact_correct=457021 adjust_hit=498936 adjust_miss=116 adjust_offset_high=166648 adjust_offset_hit=41915 adjust_offset_low=21916 adjust_try=687677
        block_bytes=933239897141 block_hash=213860230 block_ms=29385278 block_read=227867243 block_zero=12792
        chase_corrected=7546 chase_hit=498997 chase_no_data=188680 chase_try=687677 chase_uncorrected=491451
        crawl_create=221226 crawl_discard=721208 crawl_extent=118391 crawl_gen_high=10816 crawl_throttled=29 crawl_unthrottled=28
        dedup_bytes=26569767519 dedup_copy=7852747 dedup_hit=126929 dedup_ms=3329785 dedup_try=126929
        extent_deferred_inode=9607809 extent_forward=6235664 extent_mapped=118231 extent_ms=61503003 extent_ok=118391 extent_overflow=160 extent_ref_ok=6934979
        hash_already=35 hash_bump=32 hash_erase=2013 hash_evict=426888 hash_extent_in=65536 hash_extent_out=92946 hash_front=3052 hash_front_already=123629 hash_insert=427718 hash_lookup=204430449
        open_clear=881 open_file=2080947 open_hit=2080947 open_ino_ms=14745767 open_lookup_ok=2080947 open_root_ms=63488
        pairbackward_bof_first=927 pairbackward_hit=1259828 pairbackward_miss=288536 pairbackward_ms=663515 pairbackward_overlap=82 pairbackward_same=123 pairbackward_stop=498997 pairbackward_try=1549571 pairbackward_zero=75
        pairforward_eof_first=1566 pairforward_eof_malign=10287 pairforward_hit=9434316 pairforward_miss=355907 pairforward_ms=3752684 pairforward_overlap=635 pairforward_same=11780 pairforward_stop=498997 pairforward_toxic_hash=316 pairforward_try=9815256 pairforward_zero=449
        progress_ok=5286
        readahead_bytes=5099219426 readahead_clear=611 readahead_count=6418 readahead_ms=143812 readahead_skip=7106147 readahead_unread_ms=249
        replacedst_dedup_hit=124055 replacedst_grown=349596 replacedst_try=495726
        replacesrc_dedup_hit=3345 replacesrc_grown=1210 replacesrc_try=414
        resolve_empty=1934 resolve_fail=60 resolve_large=15218 resolve_ms=77228160 resolve_ok=135899 resolve_overflow=2 resolve_toxic=11
        root_clear=881 root_found=89591 root_ok=89591 root_parent_open_ok=89591 root_parent_open_try=89591
        scan_already=5934258 scan_blacklisted=27 scan_block=203163695 scan_compressed_no_dedup=412 scan_dup_block=6470339 scan_dup_hit=495652 scan_extent=6238621 scan_forward=6235664 scan_found=34425167 scan_hash_hit=495652 scan_hash_insert=425868 scan_hash_miss=74 scan_hash_preinsert=203154287 scan_hole=14 scan_lookup=193735989 scan_push_front=126670 scan_reinsert=1885 scan_resolve_hit=495726 scan_resolve_zero=1939 scan_rewrite=414 scan_seen_clear=1 scan_seen_erase=116374 scan_seen_hit=173868 scan_seen_insert=13232 scan_seen_miss=6064739 scan_skip_bytes=131 scan_skip_ops=2 scan_toxic_hash=863 scan_toxic_match=11 scan_zero=9408
        scanf_extent=6238620 scanf_extent_ms=118778745 scanf_total=6235636 scanf_total_ms=139417173
        tmp_aligned=414 tmp_block=1885 tmp_block_zero=33 tmp_bytes=7717579 tmp_copy=414 tmp_copy_ms=24 tmp_create=1 tmp_resize=829
RATES:
        addr_block=2447 addr_compressed_offset=22.42 addr_eof_e=0.473 addr_from_fd=344.094 addr_from_root_fd=7.095 addr_hole=0.106 addr_magic=0.106 addr_ms=0.465 addr_uncompressed=2424.58
        adjust_compressed_offset_correct=0.433 adjust_eof_haystack=0.001 adjust_eof_hit=0.001 adjust_eof_needle=0.001 adjust_exact=4.716 adjust_exact_correct=4.715 adjust_hit=5.148 adjust_miss=0.002 adjust_offset_high=1.72 adjust_offset_hit=0.433 adjust_offset_low=0.227 adjust_try=7.095
        block_bytes=9.62758e+06 block_hash=2206.25 block_ms=303.148 block_read=2350.75 block_zero=0.132
        chase_corrected=0.078 chase_hit=5.148 chase_no_data=1.947 chase_try=7.095 chase_uncorrected=5.07
        crawl_create=2.283 crawl_discard=7.441 crawl_extent=1.222 crawl_gen_high=0.112 crawl_throttled=0.001 crawl_unthrottled=0.001
        dedup_bytes=274102 dedup_copy=81.012 dedup_hit=1.31 dedup_ms=34.352 dedup_try=1.31
        extent_deferred_inode=99.118 extent_forward=64.329 extent_mapped=1.22 extent_ms=634.484 extent_ok=1.222 extent_overflow=0.002 extent_ref_ok=71.544
        hash_already=0.001 hash_bump=0.001 hash_erase=0.021 hash_evict=4.404 hash_extent_in=0.677 hash_extent_out=0.959 hash_front=0.032 hash_front_already=1.276 hash_insert=4.413 hash_lookup=2108.97
        open_clear=0.01 open_file=21.468 open_hit=21.468 open_ino_ms=152.122 open_lookup_ok=21.468 open_root_ms=0.655
        pairbackward_bof_first=0.01 pairbackward_hit=12.997 pairbackward_miss=2.977 pairbackward_ms=6.846 pairbackward_overlap=0.001 pairbackward_same=0.002 pairbackward_stop=5.148 pairbackward_try=15.986 pairbackward_zero=0.001
        pairforward_eof_first=0.017 pairforward_eof_malign=0.107 pairforward_hit=97.328 pairforward_miss=3.672 pairforward_ms=38.714 pairforward_overlap=0.007 pairforward_same=0.122 pairforward_stop=5.148 pairforward_toxic_hash=0.004 pairforward_try=101.258 pairforward_zero=0.005
        progress_ok=0.055
        readahead_bytes=52605.1 readahead_clear=0.007 readahead_count=0.067 readahead_ms=1.484 readahead_skip=73.31 readahead_unread_ms=0.003
        replacedst_dedup_hit=1.28 replacedst_grown=3.607 replacedst_try=5.115
        replacesrc_dedup_hit=0.035 replacesrc_grown=0.013 replacesrc_try=0.005
        resolve_empty=0.02 resolve_fail=0.001 resolve_large=0.157 resolve_ms=796.709 resolve_ok=1.402 resolve_overflow=0.001 resolve_toxic=0.001
        root_clear=0.01 root_found=0.925 root_ok=0.925 root_parent_open_ok=0.925 root_parent_open_try=0.925
        scan_already=61.22 scan_blacklisted=0.001 scan_block=2095.9 scan_compressed_no_dedup=0.005 scan_dup_block=66.75 scan_dup_hit=5.114 scan_extent=64.36 scan_forward=64.329 scan_found=355.141 scan_hash_hit=5.114 scan_hash_insert=4.394 scan_hash_miss=0.001 scan_hash_preinsert=2095.8 scan_hole=0.001 scan_lookup=1998.64 scan_push_front=1.307 scan_reinsert=0.02 scan_resolve_hit=5.115 scan_resolve_zero=0.021 scan_rewrite=0.005 scan_seen_clear=0.001 scan_seen_erase=1.201 scan_seen_hit=1.794 scan_seen_insert=0.137 scan_seen_miss=62.566 scan_skip_bytes=0.002 scan_skip_ops=0.001 scan_toxic_hash=0.009 scan_toxic_match=0.001 scan_zero=0.098
        scanf_extent=64.36 scanf_extent_ms=1225.36 scanf_total=64.329 scanf_total_ms=1438.27
        tmp_aligned=0.005 tmp_block=0.02 tmp_block_zero=0.001 tmp_bytes=79.617 tmp_copy=0.005 tmp_copy_ms=0.001 tmp_create=0.001 tmp_resize=0.009
THREADS (work queue 0 of 9811 tasks, 4 workers, load: current 0 target 0 average 0):
        tid 76659: bees: [96934s] waiting for signals
        tid 76706: ref_15288d60c000_864K_19: resolving addr 0x244257b92000 with LOGICAL_INO
        tid 76710: progress_report: [3333.94s] idle 3600
        tid 76711: status_report: writing status to file '/run/bees//1b745638-8212-45fc-b4ff-4d58dec53bc1.status'
        tid 76712: hash_writeback: flush rate limited after extent #21684 of 65536 extents
        tid 76713: hash_prefetch: [1736.91s] idle 3600s
        tid 76714: crawl_transid: [7.743s] waiting 77.9872s for next transid RateEstimator { count = 677401, raw = 103.357 / 8061.08, ratio = 103.357 / 8068.83, rate = 0.0128094, duration(1) = 78.0675, seconds_for(1) = 70.3198 }
        tid 76715: crawl_writeback: [633.474s] idle, dirty
PROGRESS:
extsz   datasz  point gen_min gen_max this cycle start tm_left   next cycle ETA
----- -------- ------ ------- ------- ---------------- ------- ----------------
  max   6.409T 285346       0  618420 2024-12-11 17:48  20w 3d 2025-06-14 20:20
  32M   2.778T 206568       0  618420 2024-12-11 17:48  28w 2d 2025-08-08 18:09
   8M   1.117T 446856       0  618420 2024-12-11 17:48 13w 22h 2025-04-23 19:23
   2M 305.419G 431980       0  618420 2024-12-11 17:48  13w 4d 2025-04-26 23:22
 512K 208.345G 241527       0  618420 2024-12-11 17:48  24w 2d 2025-07-10 23:15
 128K  10.046G 380375       0  618420 2024-12-11 17:48  15w 3d 2025-05-09 21:03
total  10.816T        gen_now  677401                  updated 2025-01-21 19:49

This was my status file on the 195h of January

TOTAL:
        addr_block=2145117750 addr_compressed_offset=57867908 addr_eof_e=881324 addr_from_fd=1059072586 addr_from_root_fd=8760169 addr_hole=49851943 addr_magic=49851943 addr_ms=1206492 addr_uncompressed=2087249842
        adjust_compressed_offset_correct=433749 adjust_eof_fail=346 adjust_eof_haystack=41360 adjust_eof_hit=41014 adjust_eof_needle=41360 adjust_exact=6635211 adjust_exact_correct=6040969 adjust_hit=6474718 adjust_miss=595488 adjust_needle_too_long=239 adjust_offset_high=1086010 adjust_offset_hit=435234 adjust_offset_low=560671 adjust_try=8758486
        block_bytes=7671250295996 block_hash=1474316275 block_ms=326454325 block_read=1873367754 block_zero=24559051
        chase_corrected=154088 chase_hit=6515732 chase_no_data=2242754 chase_try=8760169 chase_uncorrected=6361644 chase_wrong_addr=1682 chase_wrong_magic=1
        crawl_create=2570507 crawl_discard=239406188 crawl_done=8472 crawl_extent=1123940 crawl_gen_high=211875 crawl_gen_low=13436571 crawl_restart=7696 crawl_throttled=66746 crawl_unthrottled=66743
        dedup_bytes=634799669238 dedup_copy=10629254357 dedup_hit=1774123 dedup_ms=143982821 dedup_try=1774123
        exception_caught_silent=12
        extent_deferred_inode=140817107 extent_forward=39565605 extent_mapped=1123006 extent_ms=367803539 extent_ok=1123940 extent_overflow=930 extent_ref_ok=41669780 extent_zero=4
        hash_already=680 hash_bump=712 hash_erase=4259000 hash_erase_miss=1 hash_evict=130297778 hash_extent_in=65536 hash_extent_out=971095 hash_front=100625 hash_front_already=2432264 hash_insert=134573724 hash_lookup=1298518556
        open_clear=11690 open_file=15212882 open_hit=15212882 open_ino_ms=132457545 open_lookup_enoent=1 open_lookup_ok=15212882 open_root_ms=2626424
        pairbackward_bof_first=37174 pairbackward_hit=174413940 pairbackward_miss=4193207 pairbackward_ms=87279935 pairbackward_overlap=4390 pairbackward_same=2675 pairbackward_stop=6515718 pairbackward_try=178677744 pairbackward_zero=26358
        pairforward_eof_first=62981 pairforward_eof_malign=274839 pairforward_hit=173052924 pairforward_miss=4530186 pairforward_ms=84148989 pairforward_overlap=13752 pairforward_same=79115 pairforward_stop=6515718 pairforward_toxic_hash=8717 pairforward_try=178081791 pairforward_zero=59277
        progress_ok=69084
        readahead_bytes=402215997935 readahead_clear=3804 readahead_count=421837 readahead_ms=10288559 readahead_skip=50778753 readahead_unread_ms=244
        replacedst_dedup_hit=2115787 replacedst_grown=2833666 replacedst_overlaps=2 replacedst_same=12 replacedst_try=6568349
        replacesrc_dedup_hit=200306 replacesrc_grown=106400 replacesrc_try=46264
        resolve_empty=3896973 resolve_fail=2635982 resolve_large=302651 resolve_ms=2063963777 resolve_ok=3852512 resolve_overflow=76 resolve_toxic=409
        root_clear=11690 root_found=986384 root_ok=986384 root_parent_open_ok=986384 root_parent_open_try=986384
        scan_already=35160357 scan_blacklisted=902 scan_block=1124541649 scan_compressed_no_dedup=3902 scan_dup_block=156477144 scan_dup_hit=6315412 scan_extent=39747763 scan_forward=39565605 scan_found=822084610 scan_hash_hit=6315412 scan_hash_insert=133120728 scan_hash_miss=252925 scan_hash_preinsert=1117412066 scan_hole=48112 scan_lookup=951042975 scan_malign=1235 scan_push_front=2532480 scan_reinsert=1453676 scan_resolve_hit=6568349 scan_resolve_zero=4006076 scan_rewrite=46264 scan_seen_clear=19 scan_seen_erase=1350731 scan_seen_hit=2955955 scan_seen_insert=191024 scan_seen_miss=36743696 scan_skip_bytes=10444 scan_skip_ops=1123 scan_toxic_hash=41161 scan_toxic_match=409 scan_zero=7129583
        scanf_extent=39747749 scanf_extent_ms=2707637263 scanf_no_fd=1 scanf_total=39564700 scanf_total_ms=2866642516
        tmp_aligned=46264 tmp_block=1453676 tmp_block_zero=1141613 tmp_bytes=5953220345 tmp_copy=46264 tmp_copy_ms=29557 tmp_create=2 tmp_resize=92530 tmp_resize_ms=430516
RATES:
        addr_block=2066.66 addr_compressed_offset=55.752 addr_eof_e=0.85 addr_from_fd=1020.34 addr_from_root_fd=8.44 addr_hole=48.029 addr_magic=48.029 addr_ms=1.163 addr_uncompressed=2010.91
        adjust_compressed_offset_correct=0.418 adjust_eof_fail=0.001 adjust_eof_haystack=0.04 adjust_eof_hit=0.04 adjust_eof_needle=0.04 adjust_exact=6.393 adjust_exact_correct=5.821 adjust_hit=6.238 adjust_miss=0.574 adjust_needle_too_long=0.001 adjust_offset_high=1.047 adjust_offset_hit=0.42 adjust_offset_low=0.541 adjust_try=8.439
        block_bytes=7.39068e+06 block_hash=1420.39 block_ms=314.515 block_read=1804.85 block_zero=23.661
        chase_corrected=0.149 chase_hit=6.278 chase_no_data=2.161 chase_try=8.44 chase_uncorrected=6.129 chase_wrong_addr=0.002 chase_wrong_magic=0.001
        crawl_create=2.477 crawl_discard=230.651 crawl_done=0.009 crawl_extent=1.083 crawl_gen_high=0.205 crawl_gen_low=12.946 crawl_restart=0.008 crawl_throttled=0.065 crawl_unthrottled=0.065
        dedup_bytes=611582 dedup_copy=10240.5 dedup_hit=1.71 dedup_ms=138.717 dedup_try=1.71
        exception_caught_silent=0.001
        extent_deferred_inode=135.667 extent_forward=38.119 extent_mapped=1.082 extent_ms=354.352 extent_ok=1.083 extent_overflow=0.001 extent_ref_ok=40.146 extent_zero=0.001
        hash_already=0.001 hash_bump=0.001 hash_erase=4.104 hash_erase_miss=0.001 hash_evict=125.533 hash_extent_in=0.064 hash_extent_out=0.936 hash_front=0.097 hash_front_already=2.344 hash_insert=129.652 hash_lookup=1251.03
        open_clear=0.012 open_file=14.657 open_hit=14.657 open_ino_ms=127.614 open_lookup_enoent=0.001 open_lookup_ok=14.657 open_root_ms=2.531
        pairbackward_bof_first=0.036 pairbackward_hit=168.035 pairbackward_miss=4.04 pairbackward_ms=84.088 pairbackward_overlap=0.005 pairbackward_same=0.003 pairbackward_stop=6.278 pairbackward_try=172.143 pairbackward_zero=0.026
        pairforward_eof_first=0.061 pairforward_eof_malign=0.265 pairforward_hit=166.724 pairforward_miss=4.365 pairforward_ms=81.072 pairforward_overlap=0.014 pairforward_same=0.077 pairforward_stop=6.278 pairforward_toxic_hash=0.009 pairforward_try=171.569 pairforward_zero=0.058
        progress_ok=0.067
        readahead_bytes=387505 readahead_clear=0.004 readahead_count=0.407 readahead_ms=9.913 readahead_skip=48.922 readahead_unread_ms=0.001
        replacedst_dedup_hit=2.039 replacedst_grown=2.731 replacedst_overlaps=0.001 replacedst_same=0.001 replacedst_try=6.329
        replacesrc_dedup_hit=0.193 replacesrc_grown=0.103 replacesrc_try=0.045
        resolve_empty=3.755 resolve_fail=2.54 resolve_large=0.292 resolve_ms=1988.48 resolve_ok=3.712 resolve_overflow=0.001 resolve_toxic=0.001
        root_clear=0.012 root_found=0.951 root_ok=0.951 root_parent_open_ok=0.951 root_parent_open_try=0.951
        scan_already=33.875 scan_blacklisted=0.001 scan_block=1083.41 scan_compressed_no_dedup=0.004 scan_dup_block=150.755 scan_dup_hit=6.085 scan_extent=38.295 scan_forward=38.119 scan_found=792.018 scan_hash_hit=6.085 scan_hash_insert=128.252 scan_hash_miss=0.244 scan_hash_preinsert=1076.54 scan_hole=0.047 scan_lookup=916.26 scan_malign=0.002 scan_push_front=2.44 scan_reinsert=1.401 scan_resolve_hit=6.329 scan_resolve_zero=3.86 scan_rewrite=0.045 scan_seen_clear=0.001 scan_seen_erase=1.302 scan_seen_hit=2.848 scan_seen_insert=0.185 scan_seen_miss=35.4 scan_skip_bytes=0.011 scan_skip_ops=0.002 scan_toxic_hash=0.04 scan_toxic_match=0.001 scan_zero=6.869
        scanf_extent=38.295 scanf_extent_ms=2608.61 scanf_no_fd=0.001 scanf_total=38.118 scanf_total_ms=2761.8
        tmp_aligned=0.045 tmp_block=1.401 tmp_block_zero=1.1 tmp_bytes=5735.49 tmp_copy=0.045 tmp_copy_ms=0.029 tmp_create=0.001 tmp_resize=0.09 tmp_resize_ms=0.415
THREADS (work queue 0 of 9999 tasks, 4 workers, load: current 0 target 0 average 0):
        tid 2362505: bees: [1.03796e+06s] waiting for signals
        tid 2362554: ref_244616346000_128M_1: resolving addr 0x23b614786000 with LOGICAL_INO
        tid 2362555: ref_1c93b9bfe000_4.223M_3: [5.987s] resolving addr 0x23adedf50000 with LOGICAL_INO
        tid 2362557: progress_report: [1161.59s] idle 3600
        tid 2362558: status_report: writing status to file '/run/bees//1b745638-8212-45fc-b4ff-4d58dec53bc1.status'
        tid 2362559: hash_writeback: flush rate limited after extent #21093 of 65536 extents
        tid 2362560: hash_prefetch: [3388.08s] idle 3600s
        tid 2362561: crawl_transid: [9.817s] waiting 83.8508s for next transid RateEstimator { count = 675072, raw = 89.3349 / 7490.88, ratio = 89.3349 / 7500.7, rate = 0.0119102, duration(1) = 83.9616, seconds_for(1) = 74.1437 }
        tid 2362562: crawl_writeback: [258.269s] idle, dirty
PROGRESS:
extsz   datasz  point gen_min gen_max this cycle start  ctime   next cycle ETA
----- -------- ------ ------- ------- ---------------- ------ ----------------
  max   7.877T   idle       0  618420 2024-12-11 17:48      -                -
  32M   1.698T   idle       0  618420 2024-12-11 17:48      -                -
   8M  672.31G 429053       0  618420 2024-12-11 17:48 12w 6d 2025-04-20 17:00
   2M 278.881G 390899       0  618420 2024-12-11 17:48 14w 1d 2025-04-29 13:58
 512K 234.943G 024267       0  618420 2024-12-11 17:48 4y 21w 2029-06-15 04:53
 128K  31.033G 382199       0  618420 2024-12-11 17:48 14w 4d 2025-05-01 20:29
total  10.764T        gen_now  675072

January 11th

TOTAL:
        addr_block=956359745 addr_compressed_offset=7915111 addr_eof_e=413254 addr_from_fd=320524787 addr_from_root_fd=3119611 addr_hole=7015155 addr_magic=7015155 addr_ms=208427 addr_uncompressed=948444634
        adjust_compressed_offset_correct=109709 adjust_eof_fail=15 adjust_eof_haystack=17 adjust_eof_hit=2 adjust_eof_needle=17 adjust_exact=2838143 adjust_exact_correct=2662354 adjust_hit=2772063 adjust_miss=175789 adjust_offset_high=91237 adjust_offset_hit=109709 adjust_offset_low=80504 adjust_try=3119610
        block_bytes=3561652279055 block_hash=744589465 block_ms=155667726 block_read=869782958 block_zero=3940709
        chase_corrected=19474 chase_hit=2772065 chase_no_data=347545 chase_try=3119611 chase_uncorrected=2752591 chase_wrong_addr=1
        crawl_create=798594 crawl_discard=26588809 crawl_extent=705111 crawl_gen_high=51685 crawl_gen_low=89501 crawl_restart=1 crawl_throttled=50681 crawl_unthrottled=50679
        dedup_bytes=35286932982 dedup_copy=743368512 dedup_hit=342008 dedup_ms=31399703 dedup_try=342008
        extent_deferred_inode=95032501 extent_forward=20511642 extent_mapped=704539 extent_ms=189370271 extent_ok=705111 extent_overflow=572 extent_ref_ok=22713613
        hash_already=29 hash_bump=17 hash_erase=87919 hash_evict=3266110 hash_extent_in=65536 hash_extent_out=268340 hash_front=4743 hash_front_already=363837 hash_insert=3331762 hash_lookup=717637929
        open_clear=3953 open_file=7199005 open_hit=7199005 open_ino_ms=56715934 open_lookup_ok=7199005 open_root_ms=863105
        pairbackward_bof_first=3205 pairbackward_hit=78544013 pairbackward_miss=2085891 pairbackward_ms=28392194 pairbackward_overlap=3990 pairbackward_same=623 pairbackward_stop=2772064 pairbackward_try=80644658 pairbackward_zero=6935
        pairforward_eof_first=18438 pairforward_eof_malign=120347 pairforward_hit=26189011 pairforward_miss=2281140 pairforward_ms=15605254 pairforward_overlap=5827 pairforward_same=30710 pairforward_stop=2772064 pairforward_toxic_hash=4049 pairforward_try=28661054 pairforward_zero=11532
        progress_ok=22866
        readahead_bytes=53244668134 readahead_clear=2078 readahead_count=58894 readahead_ms=1183755 readahead_skip=26263085 readahead_unread_ms=244
        replacedst_dedup_hit=360640 replacedst_grown=978590 replacedst_try=2840370
        replacesrc_dedup_hit=17405 replacesrc_grown=10289 replacesrc_try=2163
        resolve_empty=2185 resolve_fail=2163 resolve_large=127888 resolve_ms=610336684 resolve_ok=570655 resolve_overflow=23 resolve_toxic=169
        root_clear=3953 root_found=337153 root_ok=337153 root_parent_open_ok=337153 root_parent_open_try=337153
        scan_already=20022795 scan_blacklisted=510 scan_block=639549015 scan_compressed_no_dedup=645 scan_dup_block=8338948 scan_dup_hit=2754659 scan_extent=20525332 scan_forward=20511642 scan_found=461678478 scan_hash_hit=2754659 scan_hash_insert=3289907 scan_hash_miss=85710 scan_hash_preinsert=638506641 scan_hole=4002 scan_lookup=612900856 scan_malign=432 scan_push_front=368411 scan_reinsert=41884 scan_resolve_hit=2840370 scan_resolve_zero=2209 scan_rewrite=2163 scan_seen_clear=2 scan_seen_erase=292374 scan_seen_hit=163109 scan_seen_insert=28842 scan_seen_miss=20358221 scan_skip_bytes=545 scan_skip_ops=49 scan_toxic_hash=14038 scan_toxic_match=169 scan_zero=1042373
        scanf_extent=20525329 scanf_extent_ms=849770983 scanf_total=20511128 scanf_total_ms=923932408
        tmp_aligned=2163 tmp_block=41884 tmp_block_zero=139603 tmp_bytes=171554624 tmp_copy=2163 tmp_copy_ms=1686 tmp_create=2 tmp_resize=4328 tmp_resize_ms=35471
RATES:
        addr_block=2853.04 addr_compressed_offset=23.613 addr_eof_e=1.233 addr_from_fd=956.199 addr_from_root_fd=9.307 addr_hole=20.928 addr_magic=20.928 addr_ms=0.622 addr_uncompressed=2829.43
        adjust_compressed_offset_correct=0.328 adjust_eof_fail=0.001 adjust_eof_haystack=0.001 adjust_eof_hit=0.001 adjust_eof_needle=0.001 adjust_exact=8.467 adjust_exact_correct=7.943 adjust_hit=8.27 adjust_miss=0.525 adjust_offset_high=0.273 adjust_offset_hit=0.328 adjust_offset_low=0.241 adjust_try=9.307
        block_bytes=1.06252e+07 block_hash=2221.28 block_ms=464.393 block_read=2594.76 block_zero=11.757
        chase_corrected=0.059 chase_hit=8.27 chase_no_data=1.037 chase_try=9.307 chase_uncorrected=8.212 chase_wrong_addr=0.001
        crawl_create=2.383 crawl_discard=79.321 crawl_extent=2.104 crawl_gen_high=0.155 crawl_gen_low=0.268 crawl_restart=0.001 crawl_throttled=0.152 crawl_unthrottled=0.152
        dedup_bytes=105269 dedup_copy=2217.64 dedup_hit=1.021 dedup_ms=93.673 dedup_try=1.021
        extent_deferred_inode=283.504 extent_forward=61.191 extent_mapped=2.102 extent_ms=564.935 extent_ok=2.104 extent_overflow=0.002 extent_ref_ok=67.76
        hash_already=0.001 hash_bump=0.001 hash_erase=0.263 hash_evict=9.744 hash_extent_in=0.196 hash_extent_out=0.801 hash_front=0.015 hash_front_already=1.086 hash_insert=9.94 hash_lookup=2140.88
        open_clear=0.012 open_file=21.477 open_hit=21.477 open_ino_ms=169.197 open_lookup_ok=21.477 open_root_ms=2.575
        pairbackward_bof_first=0.01 pairbackward_hit=234.315 pairbackward_miss=6.223 pairbackward_ms=84.701 pairbackward_overlap=0.012 pairbackward_same=0.002 pairbackward_stop=8.27 pairbackward_try=240.582 pairbackward_zero=0.021
        pairforward_eof_first=0.056 pairforward_eof_malign=0.36 pairforward_hit=78.128 pairforward_miss=6.806 pairforward_ms=46.555 pairforward_overlap=0.018 pairforward_same=0.092 pairforward_stop=8.27 pairforward_toxic_hash=0.013 pairforward_try=85.503 pairforward_zero=0.035
        progress_ok=0.069
        readahead_bytes=158841 readahead_clear=0.007 readahead_count=0.176 readahead_ms=3.532 readahead_skip=78.349 readahead_unread_ms=0.001
        replacedst_dedup_hit=1.076 replacedst_grown=2.92 replacedst_try=8.474
        replacesrc_dedup_hit=0.052 replacesrc_grown=0.031 replacesrc_try=0.007
        resolve_empty=0.007 resolve_fail=0.007 resolve_large=0.382 resolve_ms=1820.77 resolve_ok=1.703 resolve_overflow=0.001 resolve_toxic=0.001
        root_clear=0.012 root_found=1.006 root_ok=1.006 root_parent_open_ok=1.006 root_parent_open_try=1.006
        scan_already=59.733 scan_blacklisted=0.002 scan_block=1907.92 scan_compressed_no_dedup=0.002 scan_dup_block=24.877 scan_dup_hit=8.218 scan_extent=61.232 scan_forward=61.191 scan_found=1377.29 scan_hash_hit=8.218 scan_hash_insert=9.815 scan_hash_miss=0.256 scan_hash_preinsert=1904.81 scan_hole=0.012 scan_lookup=1828.42 scan_malign=0.002 scan_push_front=1.1 scan_reinsert=0.125 scan_resolve_hit=8.474 scan_resolve_zero=0.007 scan_rewrite=0.007 scan_seen_clear=0.001 scan_seen_erase=0.873 scan_seen_hit=0.487 scan_seen_insert=0.087 scan_seen_miss=60.734 scan_skip_bytes=0.002 scan_skip_ops=0.001 scan_toxic_hash=0.042 scan_toxic_match=0.001 scan_zero=3.11
        scanf_extent=61.232 scanf_extent_ms=2535.06 scanf_total=61.19 scanf_total_ms=2756.3
        tmp_aligned=0.007 tmp_block=0.125 tmp_block_zero=0.417 tmp_bytes=511.787 tmp_copy=0.007 tmp_copy_ms=0.006 tmp_create=0.001 tmp_resize=0.013 tmp_resize_ms=0.106
THREADS (work queue 63 of 7598 tasks, 4 workers, load: current 0 target 0 average 0):
        tid 2362505: bees: [335207s] waiting for signals
        tid 2362553: ref_151ccc9bc000_4M_350: scan open 16K (..0x4000] fid = 22476:1386070
        tid 2362554: ref_11f858767000_34.543M_503: Reading BeesBlockData BeesBlockData { 4K 0x4569bbd000 fd = 153 '/run/bees/mnt/1b745638-8212-45fc-b4ff-4d58dec53bc1/root/backups/UrBackup/LukasD-Surface8/241212-0400_Image_C/Image_C_241212-0400.raw', address = 0x11f85a4b3000 }
        tid 2362555: ref_23b5c63a7000_90.828M_104: Extending matching range: BeesRangePair: 11.07M src[0x2b0014000..0x2b0b26000] dst[0x2b0014000..0x2b0b26000]
src = 19 /run/bees/mnt/1b745638-8212-45fc-b4ff-4d58dec53bc1/root/backups/UrBackup/Lukas-PC2/241230-1900/A_0/SteamLibrary/steamapps/common/Palworld/Pal/Content/Paks/Pal-Windows.pak
dst = 13 /run/bees/mnt/1b745638-8212-45fc-b4ff-4d58dec53bc1/root/backups/UrBackup/Lukas-PC2/250101-1356/A_0/SteamLibrary/steamapps/common/Palworld/Pal/Content/Paks/Pal-Windows.pak
        tid 2362556: ref_23b64ee2c000_128M_1: resolving addr 0x151f1319e000 with LOGICAL_INO
        tid 2362557: progress_report: [407.038s] idle 3600
        tid 2362558: status_report: writing status to file '/run/bees//1b745638-8212-45fc-b4ff-4d58dec53bc1.status'
        tid 2362559: hash_writeback: flush rate limited after extent #39234 of 65536 extents
        tid 2362560: hash_prefetch: [2346.46s] idle 3600s
        tid 2362561: crawl_transid: waiting 65.1224s for next transid RateEstimator { count = 665214, raw = 105.092 / 6843.94, ratio = 105.092 / 6844.71, rate = 0.0153538, duration(1) = 65.1306, seconds_for(1) = 64.3554 }
        tid 2362562: crawl_writeback: [406.118s] idle, dirty
PROGRESS:
extsz   datasz  point gen_min gen_max this cycle start  ctime   next cycle ETA
----- -------- ------ ------- ------- ---------------- ------ ----------------
  max    6.23T 136439       0  618420 2024-12-11 17:48 32w 2d 2025-08-25 22:34
  32M   1.937T 136261       0  618420 2024-12-11 17:48 32w 2d 2025-08-26 05:39
   8M   1.219T 416930       0  618420 2024-12-11 17:48 10w 4d 2025-03-26 15:59
   2M 482.668G 265081       0  618420 2024-12-11 17:48 16w 4d 2025-05-08 03:02
 512K 401.945G 023813       0  618420 2024-12-11 17:48 3y 28w 2028-07-31 03:58
 128K  210.14G 008985       0  618420 2024-12-11 17:48      -                -
total  10.455T        gen_now  665212

January 2nd

TOTAL:
        addr_block=6331371 addr_compressed_offset=31436 addr_eof_e=3 addr_from_fd=343528 addr_from_root_fd=3354 addr_hole=63 addr_magic=63 addr_ms=3250 addr_uncompressed=6299935
        adjust_compressed_offset_correct=16 adjust_exact=3319 adjust_exact_correct=3319 adjust_hit=3335 adjust_offset_high=5 adjust_offset_hit=16 adjust_offset_low=14 adjust_try=3354
        block_bytes=25511589842 block_hash=6095993 block_ms=2224878 block_read=6228417 block_zero=1155
        chase_corrected=6 chase_hit=3335 chase_no_data=19 chase_try=3354 chase_uncorrected=3329
        crawl_create=4013 crawl_discard=3449 crawl_extent=1700 crawl_gen_high=197
        dedup_bytes=81965056 dedup_copy=28672 dedup_hit=1030 dedup_ms=123125 dedup_try=1030
        extent_deferred_inode=4754 extent_forward=77310 extent_mapped=1696 extent_ms=3152357 extent_ok=1700 extent_overflow=4 extent_ref_ok=110473
        hash_already=1 hash_erase=5 hash_evict=323865 hash_extent_in=65536 hash_extent_out=1947 hash_front=45 hash_front_already=1118 hash_insert=325388 hash_lookup=5996512
        open_clear=21 open_file=20438 open_hit=20438 open_ino_ms=555249 open_lookup_ok=20438 open_root_ms=65043
        pairbackward_bof_first=2 pairbackward_hit=13075 pairbackward_miss=1540 pairbackward_ms=44965 pairbackward_overlap=6 pairbackward_same=68 pairbackward_stop=3335 pairbackward_try=14693 pairbackward_zero=2
        pairforward_eof_first=3 pairforward_eof_malign=221 pairforward_hit=98367 pairforward_miss=2038 pairforward_ms=113635 pairforward_overlap=4 pairforward_same=148 pairforward_stop=3334 pairforward_try=100799 pairforward_zero=17
        progress_ok=126
        readahead_bytes=171391612 readahead_clear=7 readahead_count=176 readahead_ms=8020 readahead_skip=77205 readahead_unread_ms=12
        replacedst_dedup_hit=1550 replacedst_grown=2575 replacedst_try=3323
        replacesrc_dedup_hit=12 replacesrc_try=7
        resolve_empty=5 resolve_large=340 resolve_ms=1760176 resolve_ok=1070
        root_clear=21 root_found=1462 root_ok=1462 root_parent_open_ok=1462 root_parent_open_try=1462
        scan_already=67457 scan_block=5984545 scan_compressed_no_dedup=33 scan_dup_block=32224 scan_dup_hit=3322 scan_extent=77311 scan_forward=77310 scan_found=445643 scan_hash_hit=3322 scan_hash_insert=325382 scan_hash_preinsert=5983435 scan_lookup=5885070 scan_push_front=1163 scan_reinsert=7 scan_resolve_hit=3323 scan_resolve_zero=5 scan_rewrite=7 scan_seen_erase=894 scan_seen_hit=8300 scan_seen_insert=657 scan_seen_miss=69011 scan_skip_bytes=17 scan_skip_ops=1 scan_zero=1109
        scanf_extent=77308 scanf_extent_ms=4367276 scanf_total=77306 scanf_total_ms=5006473
        tmp_aligned=7 tmp_block=7 tmp_bytes=28672 tmp_copy=7 tmp_copy_ms=21 tmp_create=1 tmp_resize=15
RATES:
        addr_block=3078.19 addr_compressed_offset=15.284 addr_eof_e=0.002 addr_from_fd=167.017 addr_from_root_fd=1.631 addr_hole=0.031 addr_magic=0.031 addr_ms=1.581 addr_uncompressed=3062.91
        adjust_compressed_offset_correct=0.008 adjust_exact=1.614 adjust_exact_correct=1.614 adjust_hit=1.622 adjust_offset_high=0.003 adjust_offset_hit=0.008 adjust_offset_low=0.007 adjust_try=1.631
        block_bytes=1.24032e+07 block_hash=2963.76 block_ms=1081.69 block_read=3028.14 block_zero=0.562
        chase_corrected=0.003 chase_hit=1.622 chase_no_data=0.01 chase_try=1.631 chase_uncorrected=1.619
        crawl_create=1.952 crawl_discard=1.677 crawl_extent=0.827 crawl_gen_high=0.096
        dedup_bytes=39849.8 dedup_copy=13.94 dedup_hit=0.501 dedup_ms=59.862 dedup_try=0.501
        extent_deferred_inode=2.312 extent_forward=37.587 extent_mapped=0.825 extent_ms=1532.62 extent_ok=0.827 extent_overflow=0.002 extent_ref_ok=53.71
        hash_already=0.001 hash_erase=0.003 hash_evict=157.457 hash_extent_in=31.863 hash_extent_out=0.947 hash_front=0.022 hash_front_already=0.544 hash_insert=158.198 hash_lookup=2915.39
        open_clear=0.011 open_file=9.937 open_hit=9.937 open_ino_ms=269.952 open_lookup_ok=9.937 open_root_ms=31.623
        pairbackward_bof_first=0.001 pairbackward_hit=6.357 pairbackward_miss=0.749 pairbackward_ms=21.862 pairbackward_overlap=0.003 pairbackward_same=0.034 pairbackward_stop=1.622 pairbackward_try=7.144 pairbackward_zero=0.001
        pairforward_eof_first=0.002 pairforward_eof_malign=0.108 pairforward_hit=47.825 pairforward_miss=0.991 pairforward_ms=55.248 pairforward_overlap=0.002 pairforward_same=0.072 pairforward_stop=1.621 pairforward_try=49.007 pairforward_zero=0.009
        progress_ok=0.062
        readahead_bytes=83327.3 readahead_clear=0.004 readahead_count=0.086 readahead_ms=3.9 readahead_skip=37.536 readahead_unread_ms=0.006
        replacedst_dedup_hit=0.754 replacedst_grown=1.252 replacedst_try=1.616
        replacesrc_dedup_hit=0.006 replacesrc_try=0.004
        resolve_empty=0.003 resolve_large=0.166 resolve_ms=855.764 resolve_ok=0.521
        root_clear=0.011 root_found=0.711 root_ok=0.711 root_parent_open_ok=0.711 root_parent_open_try=0.711
        scan_already=32.797 scan_block=2909.57 scan_compressed_no_dedup=0.017 scan_dup_block=15.667 scan_dup_hit=1.616 scan_extent=37.588 scan_forward=37.587 scan_found=216.664 scan_hash_hit=1.616 scan_hash_insert=158.195 scan_hash_preinsert=2909.03 scan_lookup=2861.21 scan_push_front=0.566 scan_reinsert=0.004 scan_resolve_hit=1.616 scan_resolve_zero=0.003 scan_rewrite=0.004 scan_seen_erase=0.435 scan_seen_hit=4.036 scan_seen_insert=0.32 scan_seen_miss=33.552 scan_skip_bytes=0.009 scan_skip_ops=0.001 scan_zero=0.54
        scanf_extent=37.586 scanf_extent_ms=2123.29 scanf_total=37.585 scanf_total_ms=2434.05
        tmp_aligned=0.004 tmp_block=0.004 tmp_bytes=13.94 tmp_copy=0.004 tmp_copy_ms=0.011 tmp_create=0.001 tmp_resize=0.008
THREADS (work queue 0 of 312 tasks, 4 workers, load: current 0 target 0 average 0):
        tid 973: bees: [2056.84s] waiting for signals
        tid 1149: ref_11e027fa7000_12.508M_380: Reading BeesBlockData BeesBlockData { 4K 0x20d78c7000 fd = 89 '/run/bees/mnt/1b745638-8212-45fc-b4ff-4d58dec53bc1/root/backups/UrBackup/Lukas-PC2/240923-1801_Image_C/Image_C_240923-1801.raw' }
        tid 1150: ref_11e61b15d000_128M_1000: resolving addr 0x11e6888e3000 with LOGICAL_INO
        tid 1151: ref_11e088311000_15.113M_2883: Reading BeesBlockData BeesBlockData { 4K 0x166000 fd = 361 '/run/bees/mnt/1b745638-8212-45fc-b4ff-4d58dec53bc1/root/backups/UrBackup/Lukas-PC2/241101-1801/A_0/SteamLibrary/steamapps/common/Bigscreen/Bigscreen_Data/sharedassets43.assets.resS', address = 0x11e088478000 }
        tid 1152: ref_11e5f5d00000_128M_1279: scan extent 4K [0x442ee000..0x442ef000] fid = 23044:2026780 fd = 439 '/run/bees/mnt/1b745638-8212-45fc-b4ff-4d58dec53bc1/root/backups/UrBackup/Lukas-PC2/241217-1851/F_0/Videos/Aufnahmen/Videos/20170920 1612 - Pro7 MAXX HD - Dragon Ball Super.ts'
        tid 1153: progress_report: [2056.85s] idle 3600
        tid 1154: status_report: writing status to file '/run/bees//1b745638-8212-45fc-b4ff-4d58dec53bc1.status'
        tid 1155: hash_writeback: flush rate limited after extent #38293 of 65536 extents
        tid 1156: hash_prefetch: [1873.21s] idle 3600s
        tid 1157: crawl_transid: [10.479s] waiting 67.4584s for next transid RateEstimator { count = 652738, raw = 25.8912 / 1746.73, ratio = 25.8912 / 1757.22, rate = 0.0147342, duration(1) = 67.8693, seconds_for(1) = 57.385 }
        tid 1158: crawl_writeback: [256.831s] idle, dirty
PROGRESS:
    done  10.154T    %done size transid 652738   todo   inaccurate ETA
-------- -------- -------- ---- ------- ------ ------ ----------------
955.813G   6.697T 13.9374%  max       0 618420 22w 3d 2025-06-08 16:43
379.729G   2.698T 13.7427%  32M       0 618420 22w 5d 2025-06-10 22:07
197.611G 478.361G 41.3100%   8M       0 618420  7w 3d 2025-02-24 14:32
 41.396G 157.961G 26.2065%   2M       0 618420 11w 6d 2025-03-27 03:20
  3.407G 139.579G  2.4410% 512K       0 618420 2y 23w 2027-06-18 06:23
  9.848M    1.04G  0.9249% 128K       0 618420 6y 25w 2031-06-26 21:45

@XZVB12
Copy link

XZVB12 commented Jan 23, 2025

There seems to be some problems with the kernel 6.13. Tomorrow I will check on 6.12 to see if this is the cause of the problem.

<5> bees: Starting bees main loop...
<5> ref_2a78199d000_4K_18: --- BEGIN TRACE ---
<3> ref_2a78199d000_4K_18: dedup BeesRangePair: 16K src[0x31d000..0x321000] dst[0x5a1000..0x5a5000]
<3> ref_2a78199d000_4K_18: src = 203 Game/Dragon Age Origins/packages/core/env/brc502d/brc502d.gpu.rim
<3> ref_2a78199d000_4K_18: dst = 159 Game/Dragon Age Origins/packages/core/env/lak300d/lak300d.gpu.rim at bees-context.cc:216
<3> ref_2a78199d000_4K_18: bar: =============================== at bees-context.cc:759
<3> ref_2a78199d000_4K_18: Extent { begin = 0x5a1000, end = 0x5c0000, physical =0x2a78199d000, flags = Extent::OBSCURED|FIEMAP_EXTENT_ENCODED, physical_len = 0x20000, logical_len = 0x1f000 } block_count 31 at bees-context.cc:439
<3> ref_2a78199d000_4K_18: scan bfr 124K [0x5a1000..0x5c0000] fid = 5:2384876 fd= 159 'Game/Dragon Age Origins/packages/core/env/lak300d/lak300d.gpu.rim' at bees-context.cc:358
<3> ref_2a78199d000_4K_18: scan extent Extent { begin = 0x5a1000, end = 0x5c0000, physical = 0x2a78199d000, flags = Extent::OBSCURED|FIEMAP_EXTENT_ENCODED, physical_len = 0x20000, logical_len = 0x1f000 } at bees-context.cc:357
<3> ref_2a78199d000_4K_18: scan_forward 124K [0x5a1000..0x5c0000] fid = 5:2384876 at bees-context.cc:893
<3> ref_2a78199d000_4K_18: scanning root 5 ino 2384876 offset 0x5a1000 length 124K at bees-roots.cc:570
<3> ref_2a78199d000_4K_18: crawl_one_extent 5:2384876 0x5a1000+124K age 0.295 atbees-roots.cc:560
<5> ref_2a78199d000_4K_18: ---  END  TRACE ---

UPD:I don’t get a similar message on 6.12 on the same data, tomorrow I’ll try again on 6.13

@petaramesh
Copy link

petaramesh commented Jan 24, 2025

Hi,

I've seen that there had been changes in bees and AFAIR there hadn't been for a while so I just “git pulled” from master, recompiled and installed it on my 2 NASes, one classical x86_64 Intel Debian machine and a less classical aarch64 Debian Raspberry Pi 5.

First thing I notice running bees with the same configuration as previously is that it seems to be working (well it uses CPU and the disks are scratching) but I have completely lost any output to syslog (journald).

I still get some usual output from startup thru “beesd[189647]: bees version v0.11-rc3-9-g85aba7b”
up to “bees[189647]: Starting bees main loop...” and then the Great Silence.

Did I miss something about changes in logging configuration or whatever ?

(Both machines running Debian 12
Kernels :
6.7.12+bpo-amd64
6.6.62+rpt-rpi-2712

Logging lost on both.)

Thanks in advance.

@Zygo
Copy link
Owner Author

Zygo commented Jan 24, 2025

There seems to be some problems with the kernel 6.13. Tomorrow I will check on 6.12 to see if this is the cause of the problem.

<5> bees: Starting bees main loop...
<5> ref_2a78199d000_4K_18: --- BEGIN TRACE ---
<3> ref_2a78199d000_4K_18: dedup BeesRangePair: 16K src[0x31d000..0x321000] dst[0x5a1000..0x5a5000]
<3> ref_2a78199d000_4K_18: src = 203 Game/Dragon Age Origins/packages/core/env/brc502d/brc502d.gpu.rim
<3> ref_2a78199d000_4K_18: dst = 159 Game/Dragon Age Origins/packages/core/env/lak300d/lak300d.gpu.rim at bees-context.cc:216
<3> ref_2a78199d000_4K_18: bar: =============================== at bees-context.cc:759
<3> ref_2a78199d000_4K_18: Extent { begin = 0x5a1000, end = 0x5c0000, physical =0x2a78199d000, flags = Extent::OBSCURED|FIEMAP_EXTENT_ENCODED, physical_len = 0x20000, logical_len = 0x1f000 } block_count 31 at bees-context.cc:439
<3> ref_2a78199d000_4K_18: scan bfr 124K [0x5a1000..0x5c0000] fid = 5:2384876 fd= 159 'Game/Dragon Age Origins/packages/core/env/lak300d/lak300d.gpu.rim' at bees-context.cc:358
<3> ref_2a78199d000_4K_18: scan extent Extent { begin = 0x5a1000, end = 0x5c0000, physical = 0x2a78199d000, flags = Extent::OBSCURED|FIEMAP_EXTENT_ENCODED, physical_len = 0x20000, logical_len = 0x1f000 } at bees-context.cc:357
<3> ref_2a78199d000_4K_18: scan_forward 124K [0x5a1000..0x5c0000] fid = 5:2384876 at bees-context.cc:893
<3> ref_2a78199d000_4K_18: scanning root 5 ino 2384876 offset 0x5a1000 length 124K at bees-roots.cc:570
<3> ref_2a78199d000_4K_18: crawl_one_extent 5:2384876 0x5a1000+124K age 0.295 atbees-roots.cc:560
<5> ref_2a78199d000_4K_18: ---  END  TRACE ---

UPD:I don’t get a similar message on 6.12 on the same data, tomorrow I’ll try again on 6.13

That looks like a debug stack trace from equal physical addresses in dedup, except the stack trace is a higher log level than the debug message that triggers it. It's noisy but harmless.

@Zygo
Copy link
Owner Author

Zygo commented Jan 24, 2025

I still get some usual output from startup thru “beesd[189647]: bees version v0.11-rc3-9-g85aba7b”
up to “bees[189647]: Starting bees main loop...” and then the Great Silence.

Check the status files in /run/bees/ to see what it's doing in close to real time.

The main dedupe search function was rewritten, and a lot of log messages went away. The default log level (4) is lower than the list of all deduplicated filenames and offsets (6) or debug messages (7). Also, when switching an existing configuration to -m4 there might be a transitional period where some old data will be rescanned, but no new dupes found.

@Zygo
Copy link
Owner Author

Zygo commented Jan 24, 2025

I've actualy saved multiple copies of my status files so here they are, I post the full status file in case anything in the could be useful. My Kernel version is 6.8.0-51-generic This is my status File from today

RATES:
        block_bytes=7.7335e+06
        dedup_ms=36.012
        open_ino_ms=148.183
        readahead_bytes=52109.6 readahead_ms=1.484
        resolve_large=0.123 resolve_ms=834.577
        scanf_extent_ms=1227.95

7.7 M/s random and cached reads (block_bytes), 52K/s prefetching (readahead_bytes). I've noticed that very little time is typically spent prefetching compared to random reading, and there might be some benefit to trying to do more. The prefetches are sequential reads so they are one of the faster things you can do on btrfs. Less than 0.2% of the time is spent on prefetching (readahead_ms).

LOGICAL_INO (resolve_ms) takes 83% of a thread's time, pushing the average extent scan time (scanf_extent_ms) to 1.2 seconds. This is a known problem with LOGICAL_INO--most of the speed gains between v0.10 and v0.11 come from eliminating 99% of the LOGICAL_INO calls. Extent scan adds a bunch more LOGICAL_INO back in to reduce the number of reads. Now every additional LOGICAL_INO we add has to be paid for by getting rid of thousands of reads to make up the lost time.

Opening files (open_ino_ms) takes 148 ms on average...that's slow but maybe not inconsistent with metadata on a spinning drive.

Dedupes are a very small component of the overall picture, less than 0.4% (dedupe_ms).

resolve_large is relatively high (it's the number of extents with more than 10,000 references encountered per second)...if the filesystem was deduped or had a lot of snapshots, it will push up the resolving time. I usually see this after running bees v0.10 and earlier on the filesystem, where too many small extents get pushed up to very high reference counts. bees v0.11 doesn't have a way to fix this, but it should prevent it from getting worse.

That might be why the new run seems slower than the old run--because the old run made the new run slower.

PROGRESS:
extsz   datasz  point gen_min gen_max this cycle start tm_left   next cycle ETA
----- -------- ------ ------- ------- ---------------- ------- ----------------
  max   7.034T 459568       0  618420 2024-12-11 17:48 13w 21h 2025-04-24 21:32
  32M   1.835T 296911       0  618420 2024-12-11 17:48  20w 2d 2025-06-14 05:39
   8M   1.356T 531017       0  618420 2024-12-11 17:48  11w 2d 2025-04-12 12:49
   2M 354.076G 441725       0  618420 2024-12-11 17:48  13w 4d 2025-04-28 14:37
 512K 232.085G 245421       0  618420 2024-12-11 17:48  24w 4d 2025-07-14 01:47
 128K  19.697G 380418       0  618420 2024-12-11 17:48  15w 6d 2025-05-14 00:22
total  10.816T        gen_now  678614                  updated 2025-01-22 23:16

This is a cold start (gen_min 0) so everything is getting scanned again for the first time.

@kakra
Copy link
Contributor

kakra commented Jan 24, 2025

This is a cold start (gen_min 0) so everything is getting scanned again for the first time.

Well, it's a cold start anyways when going to v0.11 from v0.10, isn't it? The new scanner uses a different set of (virtual) subvolids for tracking the progress, and this starts from 0.

Would it be a good idea to defrag the files that v0.10 pushed to very high reference counts? Then it would be nice if v0.11 could log such file names where it would clearly see that reference count is very high, extent size is very small, and that's probably caused by pre-v0.11. Is something like this possible? Maybe as a separate tool?

Previously, I used the performance log lines to get an indicator of which files I should probably defrag. But I think you removed that.

@Zygo
Copy link
Owner Author

Zygo commented Jan 25, 2025

Well, it's a cold start anyways when going to v0.11 from v0.10, isn't it? The new scanner uses a different set of (virtual) subvolids for tracking the progress, and this starts from 0.

It starts from the lowest min_transid of any subvol crawler, including any legacy subvol scanners. If bees is up to date, switching to extent scan starts from the last completed transid.

Would it be a good idea to defrag the files that v0.10 pushed to very high reference counts? Then it would be nice if v0.11 could log such file names where it would clearly see that reference count is very high, extent size is very small, and that's probably caused by pre-v0.11.

Currently very high reference count extent logical addresses are logged in the debug log, but bees doesn't get the filenames because there are tens of thousands of them. The extent is simply ignored, and removed from the hash table if it was previously inserted (presumably when it had a lower ref count).

Other extents with medium-high reference counts are processed normally, with only PERFORMANCE warnings if they take more than 5 seconds. If you have years of daily snapshots then every extent has hundreds or thousands of references, so a warning would be noise.

It's possible to defrag those files, but they're kind of like asbestos in buildings--if you don't need to modify them, it's better to leave them where they are. They take as long to defrag as it takes to fragment them in the first place.

For now, I'd let bees v0.11 grind its way through them. Once they're scanned, bees won't look at them again.

Is something like this possible? Maybe as a separate tool?

There's the thing @Forza-tng was working on...

@petaramesh
Copy link

It didn't do a cold start here (it would take weeks on my NAS) but it seems to be happy :

PROGRESS:
extsz   datasz point gen_min gen_max this cycle start tm_left   next cycle ETA
----- -------- ----- ------- ------- ---------------- ------- ----------------
  max   6.091T  idle 2270053 2270054 2025-01-25 09:23       -                -
  32M   1.581T  idle 2270052 2270053 2025-01-25 09:20       -                -
   8M 282.916G  idle 2270052 2270053 2025-01-25 09:20       -                -
   2M 134.049G  idle 2270052 2270053 2025-01-25 09:20       -                -
 512K 112.382G  idle 2270052 2270053 2025-01-25 09:20       -                -
 128K  56.524G  idle 2270052 2270053 2025-01-25 09:20       -                -
total   8.244T       gen_now 2270054                  updated 2025-01-25 09:23

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants