-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CharTensor] Enable QINT8 multiplication feature #2850
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -64,7 +64,6 @@ CharTensor::CharTensor( | |
NNTR_THROW_IF(scales.size() != scale_size(), std::invalid_argument) | ||
<< "invalid scale factor size " << scales.size(); | ||
|
||
/// @note 4 * scale_size() assumes scale factors are in full-precision fp. | ||
MemoryData *mem_data = new MemoryData( | ||
(void *)(new int8_t[dim.getDataLen() + sizeof(float) * scale_size()]())); | ||
data = std::shared_ptr<MemoryData>(mem_data, [](MemoryData *mem_data) { | ||
|
@@ -268,6 +267,56 @@ void CharTensor::initialize(Initializer init) { | |
initialize(); | ||
} | ||
|
||
int CharTensor::multiply_i(float const &value) { | ||
// multiply value to scale factors | ||
float *g_scale = (float *)getScale(); | ||
|
||
sscal(scale_size(), value, g_scale, 1); | ||
return ML_ERROR_NONE; | ||
} | ||
|
||
Tensor &CharTensor::multiply(Tensor const &input, Tensor &output, | ||
const float scale) const { | ||
CREATE_IF_EMPTY_DIMS(output, dim, nullptr, q_scheme()); | ||
|
||
NNTR_THROW_IF(q_scheme() != input.q_scheme(), std::invalid_argument) | ||
<< "[Tensor] Cannot multiply tensors with different quantization schemes."; | ||
|
||
/// @note remove after vector scale multiply is implemented | ||
NNTR_THROW_IF(q_scheme() != QScheme::PER_TENSOR_AFFINE, std::invalid_argument) | ||
<< "Multiplication other than per tensor affine quantization scheme is " | ||
"NYI."; | ||
|
||
float lhs_scale = *(float *)getScale(); | ||
float rhs_scale = *input.getScale<float>(); | ||
Comment on lines
+290
to
+291
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One more thing to check... |
||
|
||
/// @note current impl assumes pre-established quantization parameters are set | ||
/// @todo 1. verify result_scale is valid 2. calculate qparams if not given | ||
NNTR_THROW_IF(std::fpclassify(lhs_scale) == FP_ZERO || | ||
std::fpclassify(rhs_scale) == FP_ZERO || | ||
std::fpclassify(scale) == FP_ZERO, | ||
std::invalid_argument) | ||
<< "scale factors not set, cannot multiply"; | ||
|
||
float multiplier = lhs_scale * rhs_scale / scale; | ||
|
||
int8_t *lhs = (int8_t *)getData(); | ||
int8_t *rhs = input.getData<int8_t>(); | ||
int8_t *result = output.getData<int8_t>(); | ||
|
||
for (unsigned int i = 0; i < size(); ++i) { | ||
int32_t accum_val = | ||
static_cast<int32_t>(lhs[i]) * static_cast<int32_t>(rhs[i]); | ||
|
||
result[i] = | ||
std::max(-128, std::min((int)std::lround(multiplier * accum_val), 127)); | ||
} | ||
|
||
*output.getScale<float>() = scale; | ||
Comment on lines
+301
to
+315
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As you might already know, in order to add simd accelerated code and maintain it with vanilla code altogether, this should be reside at // scalar scale
void ele_mul(int8_t* lhs, int8_t* rhs, int8_t* res, float lhs_scale, float rhs_scale, float scale, unsigned int N);
// vector scale
void ele_mul(int8_t* lhs, int8_t* rhs, int8_t* res, float* lhs_scale, float* rhs_scale, float* scale, unsigned int N); would function design like above valid for your intention? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for sharing your opinion! To answer yes, the function design you suggested would be valid (although for vector scale, the length of output channels should be provided). void qmul_kernel(int8_t* lhs, int8_t* rhs, int8_t* res, unsigned int data_len,
float* lhs_scale, float* rhs_scale, float* res_scale, unsigned int scale_len) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds even better. I will refer to that. |
||
|
||
return output; | ||
} | ||
|
||
void CharTensor::copy(const Tensor &from) { | ||
reshape(from.getDim()); | ||
copy(from.getData()); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are they always scalars? Based on #2844, it seems scale can be an array.
Is it related to the condition in the note, i.e., 4. only per-tensor quantization qscheme is supported ?
if so, we need to add a condition to verify if the case can be supported or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the detailed review! Yes, you're correct. Currently, there isn't such a way to check the quantization scheme of the tensor. I'll create a new PR to do so and apply it.