-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose lincomb_generic
(or: Generic Linear Combination)
#921
Comments
See #380 for some past discussion about this. I thought there was some more discussion about upstreaming such a trait to the I agree it would be nice to have a trait with the ability to operate over more than just pairs of scalars/points. I'm not sure the API Ideally I would like to see a solution for this upstream in the |
Found the |
Thanks, I'm not sure about the intra-organization-politics here but it seemed that there isn't much interest from I am actually fine with the slices APi of pub trait LinearCombination: Group {
fn lincomb(x: &Self, k: &Self::Scalar, y: &Self, l: &Self::Scalar) -> Self {
(*x * k) + (*y * l)
} I would prefer to have something along the lines of: pub trait LinearCombination<T> where Self: Mul<T, Output = Self> {
fn lincomb<const N: usize>(points: &[Self; N], scalars: &[T; N]) -> Self;
} as I am not working with the That being said, exposing |
I think it's just an issue that went by the wayside. I left a comment to hopefully revive it. Regarding: pub trait LinearCombination<T> where Self: Mul<T, Output = Self> {
fn lincomb<const N: usize>(points: &[Self; N], scalars: &[T; N]) -> Self;
} I'm not sure what your use cases for such a generic trait are. Perhaps you can simply define it yourself for your own purposes and add a blanket impl for the version in In the context of Regarding |
Yes, I can definitely do that assuming a linear combination trait in
Yes, I agree about these points. So.I guess we'll wait to see if |
If there's no movement on the I'd also be okay with exposing something specific to |
@fjarri curious if you have thoughts on a |
I think it's a good idea. I can't exactly remember why I went with the existing design; it eliminates the need for some copies, but that's about it. The slice-of-tuples is definitely safer. |
Since the |
I'd be okay with exposing an inherent method for it in If that ends up being more awkward in the code though, we can reconsider. |
So I started working on this, and I was surprised to both not understand at all the logic of the algorithm implemented and to see that it relies upon internal design details of the secp256k1 curve. So I digged into the PR #380, and I still didn't find an explantation for the algorithm, so I would love to have a reference @fjarri. I was also surprised that Now I'll explain my surprise; from my understanding, multi-exponentiation (or linear combination, depending on notation) is a rather "uninteresting" area of research; and I say uninteresting not to discredit its relevance but to signify that, much like multiplication algorithms, the low-hanging fruits have been collected long ago, and we have pretty much standardized and generalized those results by now. The algorithm I know of, after internal research that was originally done to optimize our threshold Paillier library, is a rather straightforward and generic one that relies upon lookup tables. I have also made a draft PR to crypto-bigint (that was ignored until now for some reason) to introduce those changes. This is also the approach taken by curve25519-dalek with their MutliscalarMul trait. Regarding performance, my implementation for multiexponentiation over I do not write this comment to discredit the work, to the contrary I am confessing two things:
Sorry for the confusion, and would love to hear both of your thoughts before proceeding. |
I think I lifted the algorithm exactly from
Technically it relies on the existence of endomorphism, not on secp256k1 specifically, but you are right in general. At the moment of writing I was not particularly interested in other curves, so I decided that whoever is interested in them would generalize it further. Since fast multiplication for secp256k1 uses endomorphism, that's what
I cannot speak for Tony, but my guess would be that it's ignored because it's marked as a draft, and does not pass CI.
Could you elaborate on how you measure the speedup? Running the benchmarks on your PR, I get:
I see that |
P.S.
|
Thanks for a much-detailed answer, I think that I got lost in the secp256k1-specific details which obscured the bigger picture; I will try and re-read that code with this background now before commenting on this.
I see, perhaps I'm just new to open source contributions, my code is production-ready in Tiresias, but I marked it as draft so design choices regarding crypto-bigint could be discussed prior review. @tarcieri if you're interested in doing so, let me know; I can also shift it to ready for review and let you review the code and answer my questions as you go, just thought it would be more expensive for you to do so.
Regarding benchmarking, I have run them again for the multiexp code on Tiresias, as I'm not sure how up to date my PR was (again it was more for discussion). The results are as follows, first for 4096-bit
The 1 base is identical to For 256-bit, the results are:
So for 2-bases, we have 11.88/(16.1/2)=1.475x improvement, for 10 bases we already have 11.886/(50/10)= 2.37x which already is very close to peak optimization at the 100-bases case. Comparing to the 10 bases of If the underlying algorithm is the same, and we're just using secp256k1 optimizations that make it even better, how can these findings be explained? I suggest we resolve this issue before I continue on implementing it, so we are certain we are implementing the right algorithm. |
@ycscaly sure, that's fine, though there are also some test failures that need addressed as well |
(using the multiplication/exponentiation terminology for curve points here, for simplicity) So, in a generalized windowed exponentiation most of the time is spent in the following places:
(here In a multi-exponentiation we use what's called a Shamir's trick to only go through one squaring stage for all the bases. So for
Therefore when you compare the two, the speedup you measure is
In the limit of large
That is, the more time is spent in squaring compared to the other two parts of the code, the more speedup you have. You're comparing 4096-bit integers with 256-bit curve points, so there may be multiple factors at play. But I would not be surprised if squaring is relatively more expensive for integers - because it's not much different than regular multiplication (not much to optimize there), while for curve points it is significantly simpler than multiplication. These is just an educated guess, of course; if this discrepancy worries you, I suggest you actually profile multi-exponentiation for points and integers, and see if it is really the case. |
Thanks for that. I can look through this next week, but for now just to re-iterate that dalek's |
I'm willing to bet the speedup can still be explained by the formula above. Ed25519 has its own optimizations that could affect the results. Also make sure that you're measuring constant-time multiplication for it, since constant-time table lookup is a part of |
We have reiterated your previous comment & the code and we share your view; specifically, Ed25519 has the special property where doubling a point takes the same formula as adding a point with a different one, which thus explains the difference in performance. I will now continue on the effort of taking your code and exposing it. Thanks. |
It is a problem to take slice-of-tuples. If that slice isn't sized (i.e. using a generic parameter) the only way to keep the current efficient design is to use The reason is that we loop twice, with the result of the first loop being fed as input to the second, and there is a shared expensive setup phase which we would need to double if we can't save it (and how could we save it without allocating memory, and we can't allocate it on stack if the size is not known at compile time) So I can either expose it as is, or transform to use slice-of-tuples but with a const generic parameter, which kinda defeats the purpose. That being said, there are definitely use-cases for which we'd want to have a linear combination of dynamically known size, for which Perhaps I can expose |
This seems like a good reason to keep the const generic parameter. I would still suggest changing the function name, removing |
so just change the name and expose? if so I'll do a pr now |
Thanks! |
I'd like to add a |
@tarcieri created a new issue #973 for this Lets try to get these also in the version release please |
Perhaps an approach like this might also work here: https://github.com/RustCrypto/traits/pull/1376/files#r1393273429 I should probably open a tracking issue for redesigning the |
Aren't we doing a breaking change now? And yeah I can do that but what should I name it, the appropriate is already taken... |
There haven't been any breaking changes yet, and I plan on doing breaking changes to |
Ok so how about we put such a trait in |
Sure, sounds good |
Done #974 |
The
LinearCombination
trait allows for an optimized implementation of the linear combination of two points and scalars.This is later generalized for
k256
withlincomb_generic()
which allows for an optimized implementation of a linear combination between any group ofN
points and scalars, but is unfortunately private.Here I am stressing that users such as myself may benefit from such a generic optimized linear combination implementation, and wish that we either expose
lincomb_generic()
or, preferably, think of a suitable extension for theLinearCombination
trait.@tarcieri
The text was updated successfully, but these errors were encountered: