Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CharTensor] Enable QINT8 multiplication feature #2850

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

djeong20
Copy link
Contributor

This pull request aims to enable the QINT8 element-wise multiplication feature in CharTensor.
This takes two tensors of the same dimensions and returns a matrix of the multiplied corresponding elements.
Please note that automatically determining the new scale factor will be added in a future update.

Self-evaluation:

  1. Build test: [X]Passed [ ]Failed [ ]Skipped
  2. Run test: [X]Passed [ ]Failed [ ]Skipped

Copy link
Contributor

@EunjuYang EunjuYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for the comment below.

Comment on lines +290 to +291
float lhs_scale = *(float *)getScale();
float rhs_scale = *input.getScale<float>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are they always scalars? Based on #2844, it seems scale can be an array.
Is it related to the condition in the note, i.e., 4. only per-tensor quantization qscheme is supported ?
if so, we need to add a condition to verify if the case can be supported or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed review! Yes, you're correct. Currently, there isn't such a way to check the quantization scheme of the tensor. I'll create a new PR to do so and apply it.

Copy link
Member

@skykongkong8 skykongkong8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, great work!

Comment on lines +301 to +315
float multiplier = lhs_scale * rhs_scale / scale;

int8_t *lhs = (int8_t *)getData();
int8_t *rhs = input.getData<int8_t>();
int8_t *result = output.getData<int8_t>();

for (unsigned int i = 0; i < size(); ++i) {
int32_t accum_val =
static_cast<int32_t>(lhs[i]) * static_cast<int32_t>(rhs[i]);

result[i] =
std::max(-128, std::min((int)std::lround(multiplier * accum_val), 127));
}

*output.getScale<float>() = scale;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you might already know, in order to add simd accelerated code and maintain it with vanilla code altogether, this should be reside at blas_interface.cpp level. I can do that later on, but would like to discuss one thing:

// scalar scale
void ele_mul(int8_t* lhs, int8_t* rhs, int8_t* res, float lhs_scale, float rhs_scale, float scale, unsigned int N);

// vector scale
void ele_mul(int8_t* lhs, int8_t* rhs, int8_t* res, float* lhs_scale, float* rhs_scale, float* scale, unsigned int N);

would function design like above valid for your intention?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for sharing your opinion! To answer yes, the function design you suggested would be valid (although for vector scale, the length of output channels should be provided).
Maybe we could use a single kernel to support both scalar and vector scales.

void qmul_kernel(int8_t* lhs, int8_t* rhs, int8_t* res, unsigned int data_len,
                 float* lhs_scale, float* rhs_scale, float* res_scale, unsigned int scale_len)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds even better. I will refer to that.

Comment on lines +290 to +291
float lhs_scale = *(float *)getScale();
float rhs_scale = *input.getScale<float>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing to check...
Could I confirm whether there are no plans to incorporate zero point as a qParam for QInt8 Tensors at this moment?
I could find some computational kernels that uses zero points for both int8, and uint8 so I just want to make sure where we heading to.

This pull request aims to enable the QINT8 element-wise multiplication feature in CharTensor.
This takes two tensors of the same dimensions and returns a matrix of the multiplied corresponding elements.
Please note that automatically determining the new scale factor will be added in a future update.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <[email protected]>
@djeong20 djeong20 force-pushed the tensor/qint8/multiply_v1 branch from 5df2b81 to 1e63cc4 Compare January 10, 2025 07:55
@djeong20 djeong20 changed the title [Wait for #2844][CharTensor] Enable QINT8 multiplication feature [CharTensor] Enable QINT8 multiplication feature Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants