-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize ReconstructSome for leopard 8+16 #272
base: master
Are you sure you want to change the base?
Conversation
/cc @klauspost @liamsi @musalbas; continued work on reconstructSome thanks to Elias! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Do you have any numbers?
I think this change makes sense as is. But I originally thought that there is a way to only recompute the required shares and I also thought that is what the other implementation was doing. @klauspost can you confirm that the non-leopard implementation operates only on the required shares? Like it looks like this only operates on Lines 1560 to 1606 in 5b85c72
Also, agree with @klauspost that benchmarks/numbers would help understand the impact of this (if any). |
Leopard is pyramid encoded, so several layers are needed to reconstruct a bottom shard. |
It's been too long since I looked into leopard myself and I don't fully understand this tbh. Just from a high-level perspective, it should be possible to evaluate a polynomial given enough points (i.e. n+1) in O(n), with n being the degree of the polynomial (without having to fully recompute that polynomial). Are you saying that leopard does work differently/does not allow this and has to do its O(n log n) computations either way? |
If you look at the paper, what I mean is that you should be able to save computations by simply not adding missing shard positions to the set E if they are not required: https://arxiv.org/pdf/1404.3458.pdf In the implementation this should correspond to these: Lines 423 to 425 in 162f2ba
So a simple check if they are required before setting them could do the job. But it might not be as easy as that as other parts of the code might operate under the assumption that if a position is not in E, then there was no error or erasure. So there might be other changes necessary still. |
Sorry. I don't read math. I can't even tell if the paper relates to Leopard. Reading the code it seems like input is divided into "mips" (probably derived from mip-maps from 3D) that are stacked for the final output. (*errorBitfield).prepare() converts missing shards into the mips that are needed. This makes sense to give the There may very well be additional optimization possible here, but I have no idea where to start looking for it. |
I think it is important that we have a way to compare performance, e.g. if we are trying to recover a single shard as required. Otherwise we don't know if this PR (or changes to it) improves anything. |
Yes. Currently BenchmarkDecode1K tests (among others) the time for reconstructing a single shard ( This will probably not have changed those numbers, since the only difference from using (But do note that 1K is so small the setup time is mostly dominating this particular benchmark) |
Let me try if what I wrote above (inspired by the paper that is the basis for leopard) and see if only adding required shards to the error bits, breaks anything and also if it actually has any noticeable performance gains. |
Based on #274, I've included the optimization for the error bits. See 0ae8b02 for details. In short:
The new tests fails in regular non-leopard configuration with
but I've so far failed to find an error in the test (and the leopards succeed). @klauspost can you spot my error? |
Another issue with optimizing Line 531 in 4e91954
errBits and errLocs .
|
I've fixed the caching issue by no longer including |
Great stuff. I will take a look as soon as I get some extra time. |
Kind ping @elias-orijtech @klauspost :-) |
CC @odeke-em