[CharTensor] Enable QINT8 multiplication feature #2850

djeong20 · 2024-12-31T01:57:45Z

This pull request aims to enable the QINT8 element-wise multiplication feature in CharTensor.
This takes two tensors of the same dimensions and returns a matrix of the multiplied corresponding elements.
Please note that automatically determining the new scale factor will be added in a future update.

Self-evaluation:

Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped

EunjuYang

LGTM except for the comment below.

EunjuYang · 2024-12-31T04:17:06Z

nntrainer/tensor/char_tensor.cpp

+  float lhs_scale = *(float *)getScale();
+  float rhs_scale = *input.getScale<float>();


Are they always scalars? Based on #2844, it seems scale can be an array.
Is it related to the condition in the note, i.e., 4. only per-tensor quantization qscheme is supported ?
if so, we need to add a condition to verify if the case can be supported or not.

Thank you for the detailed review! Yes, you're correct. Currently, there isn't such a way to check the quantization scheme of the tensor. I'll create a new PR to do so and apply it.

skykongkong8

Otherwise, great work!

skykongkong8 · 2025-01-06T00:41:47Z

nntrainer/tensor/char_tensor.cpp

+  float multiplier = lhs_scale * rhs_scale / scale;
+
+  int8_t *lhs = (int8_t *)getData();
+  int8_t *rhs = input.getData<int8_t>();
+  int8_t *result = output.getData<int8_t>();
+
+  for (unsigned int i = 0; i < size(); ++i) {
+    int32_t accum_val =
+      static_cast<int32_t>(lhs[i]) * static_cast<int32_t>(rhs[i]);
+
+    result[i] =
+      std::max(-128, std::min((int)std::lround(multiplier * accum_val), 127));
+  }
+
+  *output.getScale<float>() = scale;


As you might already know, in order to add simd accelerated code and maintain it with vanilla code altogether, this should be reside at blas_interface.cpp level. I can do that later on, but would like to discuss one thing:

// scalar scale void ele_mul(int8_t* lhs, int8_t* rhs, int8_t* res, float lhs_scale, float rhs_scale, float scale, unsigned int N); // vector scale void ele_mul(int8_t* lhs, int8_t* rhs, int8_t* res, float* lhs_scale, float* rhs_scale, float* scale, unsigned int N);

would function design like above valid for your intention?

Thank you for sharing your opinion! To answer yes, the function design you suggested would be valid (although for vector scale, the length of output channels should be provided).
Maybe we could use a single kernel to support both scalar and vector scales.

void qmul_kernel(int8_t* lhs, int8_t* rhs, int8_t* res, unsigned int data_len, float* lhs_scale, float* rhs_scale, float* res_scale, unsigned int scale_len)

Sounds even better. I will refer to that.

skykongkong8 · 2025-01-07T00:22:14Z

nntrainer/tensor/char_tensor.cpp

+  float lhs_scale = *(float *)getScale();
+  float rhs_scale = *input.getScale<float>();


One more thing to check...
Could I confirm whether there are no plans to incorporate zero point as a qParam for QInt8 Tensors at this moment?
I could find some computational kernels that uses zero points for both int8, and uint8 so I just want to make sure where we heading to.

This pull request aims to enable the QINT8 element-wise multiplication feature in CharTensor. This takes two tensors of the same dimensions and returns a matrix of the multiplied corresponding elements. Please note that automatically determining the new scale factor will be added in a future update. **Self-evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghyeon Jeong <[email protected]>

djeong20 requested review from myungjoo, jijoongmoon, again4you, jaeyun-jung, leemgs, wooksong, gichan-jang, anyj0527, lhs8928, songgot, jihochu, DonghakPark, SeoHyungjun, baek2sm, skykongkong8 and EunjuYang as code owners December 31, 2024 01:57

github-actions bot added the Need Review label Dec 31, 2024

djeong20 force-pushed the tensor/qint8/multiply_v1 branch from 81b92c9 to 5df2b81 Compare December 31, 2024 02:01

EunjuYang reviewed Dec 31, 2024

View reviewed changes

EunjuYang approved these changes Jan 2, 2025

View reviewed changes

skykongkong8 approved these changes Jan 6, 2025

View reviewed changes

skykongkong8 reviewed Jan 7, 2025

View reviewed changes

skykongkong8 mentioned this pull request Jan 7, 2025

[ neon ] Implement int8 mul neon simd kernel @open sesame 01/09 12:38 #2857

Open

djeong20 force-pushed the tensor/qint8/multiply_v1 branch from 5df2b81 to 1e63cc4 Compare January 10, 2025 07:55

djeong20 changed the title ~~[Wait for #2844][CharTensor] Enable QINT8 multiplication feature~~ [CharTensor] Enable QINT8 multiplication feature Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CharTensor] Enable QINT8 multiplication feature #2850

[CharTensor] Enable QINT8 multiplication feature #2850

djeong20 commented Dec 31, 2024

EunjuYang left a comment

EunjuYang Dec 31, 2024

djeong20 Dec 31, 2024

skykongkong8 left a comment

skykongkong8 Jan 6, 2025

djeong20 Jan 6, 2025

skykongkong8 Jan 6, 2025

skykongkong8 Jan 7, 2025

		float lhs_scale = (float )getScale();
		float rhs_scale = *input.getScale<float>();

[CharTensor] Enable QINT8 multiplication feature #2850

Are you sure you want to change the base?

[CharTensor] Enable QINT8 multiplication feature #2850

Conversation

djeong20 commented Dec 31, 2024

EunjuYang left a comment

Choose a reason for hiding this comment

EunjuYang Dec 31, 2024

Choose a reason for hiding this comment

djeong20 Dec 31, 2024

Choose a reason for hiding this comment

skykongkong8 left a comment

Choose a reason for hiding this comment

skykongkong8 Jan 6, 2025

Choose a reason for hiding this comment

djeong20 Jan 6, 2025

Choose a reason for hiding this comment

skykongkong8 Jan 6, 2025

Choose a reason for hiding this comment

skykongkong8 Jan 7, 2025

Choose a reason for hiding this comment