-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hadar/reduction from storage #745
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some questions and comments
icicle/include/icicle/fields/field.h
Outdated
@@ -239,6 +239,20 @@ class Field | |||
} | |||
} | |||
|
|||
// access precomputed values for first step of the from storage function (see below) | |||
static HOST_DEVICE_INLINE Field get_reduced_digit(int i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should clarify by adding modulus / p to the function name as well to the reduced_digits struct and the mod_sub function + struct
between 0 and p. This is done using 3 steps: | ||
1. Splitting the number into TLC sized digits - xs = x_i * p_i = x_i * 2^(TLC*32*i). | ||
p_i are precomputed modulo p and so the first step is performed multiplying by p_i and accumultaing. | ||
At the end of this step the number is reduced from NLIMBS to 2*TLC+1 (assuming less than 2^32 additions). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at the code bellow 2*TLC+2 limbs are assigned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will clarify
icicle/include/icicle/fields/field.h
Outdated
At the end of this step the number is reduced from NLIMBS to 2*TLC+1 (assuming less than 2^32 additions). | ||
2. The second step subtracts a single precomputed multiple of p in ordr to reduce the number into the range 0<x<2^(2n) | ||
where n is the modulus bit count. This step makes use of a look-up table that looks at the top bits of the number (it | ||
is enough to look at the bits from 2*(2n-1) and above). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be 2n - 1 (without the 2x)? The storage is only up to 2*TLC + 2, index 4n-2 is out of bounds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be 2^(2n-1), will fix
uint64_t carry = 0; | ||
#pragma unroll | ||
for (unsigned i = 0; i < NLIMBS_B / 2; i++) { | ||
for (unsigned j = 0; j < NLIMBS_A / 2; j++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you swap the loops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to fix a bug that was inserted in an earlier PR..
|
||
// This function generates the precomputed values for the second step of the from storage function in the field class. | ||
// However, it is not used due to constexpr computation overflow. Instead, the values are currently generated by a | ||
// python script and appear explicitly in the field config struct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for leaving it then? Do you plan on fixing the overflow issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason to leave it is to preserve the logic, because the python script is not in icicle, and maybe we will find a way to fix the overflow yes
@@ -8,7 +8,110 @@ namespace bls12_377 { | |||
struct fp_config { | |||
static constexpr storage<8> modulus = {0x00000001, 0x0a118000, 0xd0000001, 0x59aa76fe, | |||
0x5c37b001, 0x60b44d1e, 0x9a2ca556, 0x12ab655e}; | |||
static constexpr unsigned reduced_digits_count = 3; | |||
static constexpr storage_array<reduced_digits_count, 8> reduced_digits = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how did you generate the struct? Is it feasible to add it to the macros computed during build?
I didn't notice a reference to it in the macro (In contrast to mod_sub)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was also precomputed using a python script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
This PR adds a function that converts a storage object (up to 576 bits) into a field element using precomputed values. This is useful for sumcheck and poseidon. For now it does not support Mersenne, and it doesn't make full use of the 64-bit multiplier on cpu.
cuda-backend-branch: hadar/reduce-storage