Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[onert] Optimize Bias Grad Computation #12673

Merged
merged 1 commit into from
Feb 27, 2024
Merged

[onert] Optimize Bias Grad Computation #12673

merged 1 commit into from
Feb 27, 2024

Conversation

Aeren1564
Copy link
Contributor

@Aeren1564 Aeren1564 commented Feb 21, 2024

This commit optimizes bias grad computation with multithreading.

ONE-DCO-1.0-Signed-off-by: YongHyun An [email protected]


From draft #12571, #12666
Relevant issue: #12569

@Aeren1564
Copy link
Contributor Author

Training Time for mobilenet_v2

Before this commit

== training parameter ==
- learning_rate   = 0.001
- batch_size      = 32
- loss_info       = {loss = mean squared error, reduction = sum over batch size}
- optimizer       = adam
========================
Epoch 1/5 - time: 4430.210ms/step - loss: [0] 30090.0898
Epoch 2/5 - time: 4607.027ms/step - loss: [0] 30090.0840
Epoch 3/5 - time: 4669.991ms/step - loss: [0] 30090.0840
Epoch 4/5 - time: 4705.184ms/step - loss: [0] 30090.0840
Epoch 5/5 - time: 4659.798ms/step - loss: [0] 30090.0840
===================================
MODEL_LOAD   takes 5.4760 ms
PREPARE      takes 786.9770 ms
EXECUTE      takes 207755.0330 ms
- Epoch 1      takes 39871.8920 ms
- Epoch 2      takes 41463.2420 ms
- Epoch 3      takes 42029.9200 ms
- Epoch 4      takes 42346.6520 ms
- Epoch 5      takes 41938.1790 ms
===================================

After this commit

== training parameter ==
- learning_rate   = 0.001
- batch_size      = 32
- loss_info       = {loss = mean squared error, reduction = sum over batch size}
- optimizer       = adam
========================
Epoch 1/5 - time: 2800.052ms/step - loss: [0] 30090.0977
Epoch 2/5 - time: 2580.089ms/step - loss: [0] 30090.0938
Epoch 3/5 - time: 2654.771ms/step - loss: [0] 30090.0938
Epoch 4/5 - time: 2647.186ms/step - loss: [0] 30090.0938
Epoch 5/5 - time: 2648.930ms/step - loss: [0] 30090.0938
===================================
MODEL_LOAD   takes 5.3360 ms
PREPARE      takes 775.7780 ms
EXECUTE      takes 120092.1480 ms
- Epoch 1      takes 25200.4690 ms
- Epoch 2      takes 23220.8040 ms
- Epoch 3      takes 23892.9350 ms
- Epoch 4      takes 23824.6720 ms
- Epoch 5      takes 23840.3740 ms
===================================

@@ -170,17 +169,8 @@ void DepthwiseConvolutionLayer::backwardFloat32()
// Calculate gradient for bias
if (_bias)
{
// TODO Use optimized kernel
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit resolves this TODO item.

@@ -196,20 +195,8 @@ void ConvolutionLayer::backwardFloat32()
// Calculate gradient for bias
if (_bias)
{
// TODO Use optimized kernel
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit resolves this TODO item.

@Aeren1564 Aeren1564 marked this pull request as ready for review February 21, 2024 09:35
@Aeren1564 Aeren1564 added approval: 2 Require at least 2 approvals PR/ready for review It is ready to review. Please review it. labels Feb 21, 2024
@Aeren1564 Aeren1564 requested a review from a team February 21, 2024 09:37
@Aeren1564 Aeren1564 marked this pull request as draft February 22, 2024 04:41
@Aeren1564 Aeren1564 removed the PR/ready for review It is ready to review. Please review it. label Feb 22, 2024
@Aeren1564 Aeren1564 marked this pull request as ready for review February 22, 2024 05:24
@Aeren1564 Aeren1564 added the PR/ready for review It is ready to review. Please review it. label Feb 22, 2024
Copy link
Contributor

@YongseopKim YongseopKim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In runtime/onert/backend, there are not directly using sths in Eigen. In convention, onert doesn't use directly specific kernel such as Eigen AFAIK. How about hiding this?

  • AS-IS: in biasGrad(), we can see Eigen.
  • TO-BE: in biasGradIO, there are no Eigen.

I'd like to get feedback from others.

@jyoungyun
Copy link
Contributor

@Aeren1564 Would you like to look at another file structure? This is because this PR doesn't fit with the existing code design. I think this reduxFunctor.h should be located in the compute/cker/include/cker/eigen/. And when copying code from other repository, it is recommended not to modify it as much as possible. So it would be better to keep the file name as before. :)

@Aeren1564 Aeren1564 removed the PR/ready for review It is ready to review. Please review it. label Feb 26, 2024
@Aeren1564 Aeren1564 marked this pull request as draft February 26, 2024 01:07
@Aeren1564 Aeren1564 marked this pull request as ready for review February 26, 2024 02:32
@Aeren1564 Aeren1564 added the PR/ready for review It is ready to review. Please review it. label Feb 26, 2024
@Aeren1564 Aeren1564 requested a review from a team February 26, 2024 02:57
float *bias_grad_buffer = getBuffer<float>(bias_grad);
nnfw::cker::Tensor bias_grad_t{bias_grad_shape, bias_grad_buffer};

Eigen::Index outer = input_backprop_shape.Dims(input_backprop_shape.DimensionsCount() - 1);
Copy link
Contributor

@YongseopKim YongseopKim Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These Eigen class cannot be hidden?

Until now, onert has hidden these things AFAIK.

ONE/runtime/onert/backend (master ✗) grep -rni "eigen"
train/TensorManager.h:38:  // _align is equal to EIGEN_MAX_ALIGN_BYTES, though including Eigen headers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hid "Eigen" from onert but now redux_functor accepts the following arguments

void operator()(const Device &device, const Eigen::Index &outer_dim, const Eigen::Index &inner_dim, const Tensor &input, Tensor *output) const

Please let me know if you have some better way to handle this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The built-in type can be transmitted and converted into the Eigen::Index in redux_functor function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The built-in type can be transmitted and converted into the Eigen::Index in redux_functor function.

I implemented @YongseopKim 's suggestion for now 👀

@YongseopKim
Copy link
Contributor

I thought like this @Aeren1564 .

This commit optimizes bias grad computation with multithreading.

ONE-DCO-1.0-Signed-off-by: YongHyun An <[email protected]>
Comment on lines +193 to +194
void biasReductionHelper(float *input_backprop_buffer, const Shape &input_backprop_shape,
float *bias_grad_buffer, const Shape &bias_grad_shape)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from @YongseopKim 's suggestion.

Copy link
Contributor

@YongseopKim YongseopKim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@Aeren1564 Aeren1564 requested a review from jyoungyun February 26, 2024 23:32
Copy link
Contributor

@jyoungyun jyoungyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thank you!!!

@hseok-oh hseok-oh merged commit b4444ee into Samsung:master Feb 27, 2024
9 checks passed
@Aeren1564 Aeren1564 deleted the optimize-bias branch February 27, 2024 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approval: 2 Require at least 2 approvals PR/ready for review It is ready to review. Please review it.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants