Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hadar/reduction from storage #745

Merged
merged 34 commits into from
Feb 4, 2025
Merged

Hadar/reduction from storage #745

merged 34 commits into from
Feb 4, 2025

Conversation

HadarIngonyama
Copy link
Contributor

@HadarIngonyama HadarIngonyama commented Jan 22, 2025

This PR adds a function that converts a storage object (up to 576 bits) into a field element using precomputed values. This is useful for sumcheck and poseidon. For now it does not support Mersenne, and it doesn't make full use of the 64-bit multiplier on cpu.

cuda-backend-branch: hadar/reduce-storage

Copy link
Contributor

@mickeyasa mickeyasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some questions and comments

@HadarIngonyama HadarIngonyama changed the title Hadar/basic from storage Hadar/reduction from storage Jan 27, 2025
@@ -239,6 +239,20 @@ class Field
}
}

// access precomputed values for first step of the from storage function (see below)
static HOST_DEVICE_INLINE Field get_reduced_digit(int i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should clarify by adding modulus / p to the function name as well to the reduced_digits struct and the mod_sub function + struct

between 0 and p. This is done using 3 steps:
1. Splitting the number into TLC sized digits - xs = x_i * p_i = x_i * 2^(TLC*32*i).
p_i are precomputed modulo p and so the first step is performed multiplying by p_i and accumultaing.
At the end of this step the number is reduced from NLIMBS to 2*TLC+1 (assuming less than 2^32 additions).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at the code bellow 2*TLC+2 limbs are assigned

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will clarify

At the end of this step the number is reduced from NLIMBS to 2*TLC+1 (assuming less than 2^32 additions).
2. The second step subtracts a single precomputed multiple of p in ordr to reduce the number into the range 0<x<2^(2n)
where n is the modulus bit count. This step makes use of a look-up table that looks at the top bits of the number (it
is enough to look at the bits from 2*(2n-1) and above).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be 2n - 1 (without the 2x)? The storage is only up to 2*TLC + 2, index 4n-2 is out of bounds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be 2^(2n-1), will fix

uint64_t carry = 0;
#pragma unroll
for (unsigned i = 0; i < NLIMBS_B / 2; i++) {
for (unsigned j = 0; j < NLIMBS_A / 2; j++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you swap the loops?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to fix a bug that was inserted in an earlier PR..


// This function generates the precomputed values for the second step of the from storage function in the field class.
// However, it is not used due to constexpr computation overflow. Instead, the values are currently generated by a
// python script and appear explicitly in the field config struct.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for leaving it then? Do you plan on fixing the overflow issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason to leave it is to preserve the logic, because the python script is not in icicle, and maybe we will find a way to fix the overflow yes

@@ -8,7 +8,110 @@ namespace bls12_377 {
struct fp_config {
static constexpr storage<8> modulus = {0x00000001, 0x0a118000, 0xd0000001, 0x59aa76fe,
0x5c37b001, 0x60b44d1e, 0x9a2ca556, 0x12ab655e};
static constexpr unsigned reduced_digits_count = 3;
static constexpr storage_array<reduced_digits_count, 8> reduced_digits = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did you generate the struct? Is it feasible to add it to the macros computed during build?
I didn't notice a reference to it in the macro (In contrast to mod_sub)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was also precomputed using a python script

Copy link
Contributor

@mickeyasa mickeyasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@HadarIngonyama HadarIngonyama merged commit 3dcdbd8 into main Feb 4, 2025
15 checks passed
@HadarIngonyama HadarIngonyama deleted the hadar/basic-from-storage branch February 4, 2025 09:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants