-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[onert] Introduce ExtraTensorRequest #13604
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,69 @@ | ||||||
/* | ||||||
* Copyright (c) 2024 Samsung Electronics Co., Ltd. All Rights Reserved | ||||||
* | ||||||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||||||
* you may not use this file except in compliance with the License. | ||||||
* You may obtain a copy of the License at | ||||||
* | ||||||
* http://www.apache.org/licenses/LICENSE-2.0 | ||||||
* | ||||||
* Unless required by applicable law or agreed to in writing, software | ||||||
* distributed under the License is distributed on an "AS IS" BASIS, | ||||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||||||
* See the License for the specific language governing permissions and | ||||||
* limitations under the License. | ||||||
*/ | ||||||
|
||||||
#ifndef __ONERT_BACKEND_EXTRA_TENSOR_REQUEST_H__ | ||||||
#define __ONERT_BACKEND_EXTRA_TENSOR_REQUEST_H__ | ||||||
|
||||||
#include "backend/train/ExtraTensor.h" | ||||||
|
||||||
namespace onert | ||||||
{ | ||||||
namespace backend | ||||||
{ | ||||||
namespace train | ||||||
{ | ||||||
|
||||||
enum class ExtraTensorLifeTime | ||||||
{ | ||||||
BACKWARD, // alive during backward() | ||||||
FORWARD_TO_BACKWARD, // alive from forward to backward() | ||||||
}; | ||||||
class ExtraTensorRequest | ||||||
{ | ||||||
|
||||||
public: | ||||||
ExtraTensorRequest(ir::OperandInfo info, ExtraTensorLifeTime lt, ExtraTensor **addr) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
About There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Aha, Thank you for your notice. I'll try to find another way 🤔 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm.. I rethink about this. Easy way is ..
Otherwise, About adding
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jyoungyun Could you share your opinion? Do you think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @zetwhite I prefer to use smart pointer instead of raw pointer. The smart pointers do automatic memory management and it lessen tha chances of common pointer-related errors, enhancing code reliability. If you implement this code using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I couldn't find a way to solve this issue better than using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ragmani @jyoungyun I'll update to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As I understand, double pointer was there since it is output parameter. Is the ownership is shared? If not, we don't need to use In fact, I am not sure There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The ownership is shared with the
If managing and planning There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
About this.. IMO there is no concrete answer :
I thought that somehow we need to manage the (Extra or LayerScope) tensors in Could you share your opinion( or what you are concerned) about this? (+) Ah, I could find a details in here #13604 (comment) |
||||||
: _info(info), _lifetime(lt), _address(addr) | ||||||
{ | ||||||
} | ||||||
|
||||||
static ExtraTensorRequest createLike(const IPortableTensor *origin, ExtraTensor **addr) | ||||||
{ | ||||||
assert(origin != nullptr); | ||||||
assert(addr != nullptr); | ||||||
|
||||||
return ExtraTensorRequest(origin->get_info(), ExtraTensorLifeTime::BACKWARD, addr); | ||||||
} | ||||||
|
||||||
public: | ||||||
const ir::OperandInfo &info() const { return _info; } | ||||||
ExtraTensorLifeTime lifetime() const { return _lifetime; } | ||||||
Comment on lines
+52
to
+53
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I intentionally add only getters for these fields. |
||||||
|
||||||
void update_address(ExtraTensor *tensor) { *_address = tensor; } | ||||||
|
||||||
private: | ||||||
ir::OperandInfo _info; | ||||||
ExtraTensorLifeTime _lifetime; | ||||||
backend::train::ExtraTensor **_address; | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
}; | ||||||
|
||||||
using ExtraTensorRequests = std::vector<ExtraTensorRequest>; | ||||||
|
||||||
} // namespace train | ||||||
} // namespace backend | ||||||
} // namespace onert | ||||||
|
||||||
#endif // __ONERT_BACKEND_EXTRA_TENSOR_REQUEST_H__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first of all, I may be asking the wrong question due to my lack of understanding of this function, so please understand.
what does lifetime mean?
is it dynamically allocating/deallocating memory at the start/end of the section(backward, forward_to_backward)?
and, what does "Extra" mean in ExtraTensor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review and interest! I'm glad that I could explain it.
ExtraTensor means a tensor additionally requested while doing backward(backpropagte).
For example, while backwarding FullyConnected, we need to calculate
X^T
andW^T
.Since
X^T
andW^T
are 'additionaly' needed because of mathmetical properties of FC, I used the word 'Extra' to mark it additional tensor related to the specific layer.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lifetime means the time that additional tensors must remain alive.
In the example above,
W^T
andX^T
are used only in backwarding of FC layer.So, the lifetime of these tensors are limited to backward(). (mark it
BACKWARD
)Some ExtraTensors have to be alive while both forwarding and backwarding.
So, I'd like to mark it as
FORWARD_TO_BACKWARD
.With lifetime information, we can know in advance when and how much tensor will be needed.
So My plan is to (statically) pre-allocate maximum memory before the training process starts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that my explanation is somehow verbose and a bit hard to understand 😢
If you're more interested I could explain the details offline!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.
When I suggested
LayerScopeTensor
in #13605 (comment) based on your commentBut it seems misleading to use the scope is one specific layer.
It may be accessed in the specific layer only, (but it is true for all other tensors) but its lifetime is longer than the specific layer.
I expected it is like a local variable, which is released when it is out of the scope.
2.
Why do we need
ExtraTensor
to get the scope of a Tensor?Is it possible to use use-def chain to get the scope of a Tensor?
It would be better not to introduce yet another tensor if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About 2.
As I understood, using the
use-def
chain is not easy in this problem. (at least for now)Current
use-def
chain has below structure.This assumes that
_operand
(tensor) is defined and used by operations in the graph.Since ExtraTensors are definition and usage is dependent on each layer's implementation,
It is NOT shown in the graph. So, it is hard to express ExtraTensor's use-def with current use-def chains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
about 1.
I thought the word 'scope' meant 'where the tensor can be accessed' and it doesn't imply the lifetime.
So for me, the word 'scope' looks proper in this case.
Aha, I agree with your point. About the weight tensors, it is accessed by one specific layer.
I need to re-consider the Tensor naming and its property carefully.