-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[onert] Introduce ExtraTensorRequest #13604
Conversation
1294475
to
b008152
Compare
This PR introduces ExtraTensorRequest. Through this class, each TrainableFunction can request extra tensors to be pre-allocated. ONE-DCO-1.0-Signed-off-by: seunghui youn <[email protected]>
b008152
to
0182ae2
Compare
const ir::OperandInfo &info() const { return _info; } | ||
ExtraTensorLifeTime lifetime() const { return _lifetime; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intentionally add only getters for these fields.
Once ExtraTensor's info and lifetime are set, there is no need to be changed.
{ | ||
BACKWARD, // alive during backward() | ||
FORWARD_TO_BACKWARD, // alive from forward to backward() | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first of all, I may be asking the wrong question due to my lack of understanding of this function, so please understand.
what does lifetime mean?
is it dynamically allocating/deallocating memory at the start/end of the section(backward, forward_to_backward)?
and, what does "Extra" mean in ExtraTensor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first of all, I may be asking the wrong question due to my lack of understanding of this function, so please understand.
Thanks for your review and interest! I'm glad that I could explain it.
what does "Extra" mean in ExtraTensor?
ExtraTensor means a tensor additionally requested while doing backward(backpropagte).
For example, while backwarding FullyConnected, we need to calculate X^T
and W^T
.
Since X^T
and W^T
are 'additionaly' needed because of mathmetical properties of FC, I used the word 'Extra' to mark it additional tensor related to the specific layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does lifetime mean?
Lifetime means the time that additional tensors must remain alive.
In the example above, W^T
and X^T
are used only in backwarding of FC layer.
So, the lifetime of these tensors are limited to backward(). (mark it BACKWARD
)
Some ExtraTensors have to be alive while both forwarding and backwarding.
So, I'd like to mark it as FORWARD_TO_BACKWARD
.
is it dynamically allocating/deallocating memory at the start/end of the section(backward, forward_to_backward)?
With lifetime information, we can know in advance when and how much tensor will be needed.
So My plan is to (statically) pre-allocate maximum memory before the training process starts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that my explanation is somehow verbose and a bit hard to understand 😢
If you're more interested I could explain the details offline!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.
When I suggested LayerScopeTensor
in #13605 (comment) based on your comment
ExtraTensor is a tensor that is accessed within one operation layer.
In other words, the scope of the extra tensor is confined to one specific layer.
But it seems misleading to use the scope is one specific layer.
It may be accessed in the specific layer only, (but it is true for all other tensors) but its lifetime is longer than the specific layer.
I expected it is like a local variable, which is released when it is out of the scope.
2.
Why do we need ExtraTensor
to get the scope of a Tensor?
Is it possible to use use-def chain to get the scope of a Tensor?
It would be better not to introduce yet another tensor if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About 2.
Is it possible to use use-def chain to get the scope of a Tensor?
As I understood, using the use-def
chain is not easy in this problem. (at least for now)
Current use-def
chain has below structure.
// UseDefChain.h
class UseDefChain
{
const Operand &_operand;
std::set<TrainingOperationIndex> _uses;
std::set<TrainingOperationIndex> _defs;
};
This assumes that _operand
(tensor) is defined and used by operations in the graph.
Since ExtraTensors are definition and usage is dependent on each layer's implementation,
It is NOT shown in the graph. So, it is hard to express ExtraTensor's use-def with current use-def chains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
about 1.
But it seems misleading to use the scope is one specific layer.
I thought the word 'scope' meant 'where the tensor can be accessed' and it doesn't imply the lifetime.
So for me, the word 'scope' looks proper in this case.
It may be accessed in the specific layer only, (but it is true for all other tensors)
Aha, I agree with your point. About the weight tensors, it is accessed by one specific layer.
I need to re-consider the Tensor naming and its property carefully.
{ | ||
|
||
public: | ||
ExtraTensorRequest(ir::OperandInfo info, ExtraTensorLifeTime lt, ExtraTensor **addr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExtraTensorRequest(ir::OperandInfo info, ExtraTensorLifeTime lt, ExtraTensor **addr) | |
ExtraTensorRequest(const ir::OperandInfo &info, ExtraTensorLifeTime lt, ExtraTensor **addr) |
About ExtraTensor **addr
, can't it be implemented without using a double pointer? If you want to use double pointer, please add const
to the variable properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, Thank you for your notice. I'll try to find another way 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. I rethink about this.
Easy way is ..
- use
ExtraTensor** const
- use
ExtraTensor*&
Otherwise, shared_ptr<ExtraTensor>&
looks also possible.
About adding const
, ExtraTensor**
just becomes ExtraTensor** const
.
ExtraTensor
should be mutable in the view of each Layer.ExtraTensor*
should be mutable, this has to be updated after the extra tensor is registeredExtraTensor**
can be constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jyoungyun Could you share your opinion?
Do you think ExtraTensor*&
is safe enough? or should we have to find another way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zetwhite I prefer to use smart pointer instead of raw pointer. The smart pointers do automatic memory management and it lessen tha chances of common pointer-related errors, enhancing code reliability. If you implement this code using shared_ptr
, how about trying to use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ragmani Could you share your opinion about this? Is it fine to add shared_ptr in TensorRegistry?
I couldn't find a way to solve this issue better than using shared_ptr
or double pointer. And I think it's better to use shared_ptr&
instead of double raw pointer as https://github.com/Samsung/ONE/pull/13604/files#r1708816555.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ragmani @jyoungyun I'll update to use shared_ptr
and re-request reviews soon. Thank you for help!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, double pointer was there since it is output parameter. Is the ownership is shared? If not, we don't need to use shared_ptr
.
In fact, I am not sure ExtraTensor
is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, double pointer was there since it is output parameter. Is the ownership is shared? If not, we don't need to use
shared_ptr
.
The ownership is shared with the TensorRegistry
if each layer has the ownership.
In fact, I am not sure
ExtraTensor
is necessary.
If managing and planning ExtraTensor
s are the same as other tensor types, those tensor types can be unified. But I'm still not sure if that's the case yet. I think there are DisposableTensor
and GradientTensor
as candidates, but ExtraTensor
may be required in both forwarding and backwarding nodes in some cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the ownership is shared?
About this.. IMO there is no concrete answer :
- Sbd could see that only TensorRegistry takes ownership and Layer borrows(not owns) the tensors.
- Otherwise, Sbd could see that both TensorRegistry and Layer own the tensors.
In fact, I am not sure ExtraTensor is necessary.
I thought that somehow we need to manage the (Extra or LayerScope) tensors in core
to plan the memory usage. So, I tried the way define the tensor as 'extra' and manage it through TensorManager.
Could you share your opinion( or what you are concerned) about this?
Since this work is not in a hurry, I would like to reflect your opinion.
(+) Ah, I could find a details in here #13604 (comment)
private: | ||
ir::OperandInfo _info; | ||
ExtraTensorLifeTime _lifetime; | ||
backend::train::ExtraTensor **_address; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backend::train::ExtraTensor **_address; | |
std::shared_ptr<backend::train::ExtraTensor> &_address; |
Using Because Each Layer could generate its own tensor and share the So, I'll close this PR and move to next PR. |
This PR introduces ExtraTensorRequest.
Through this class, each TrainableFunction can request extra tensors to be pre-allocated.
ONE-DCO-1.0-Signed-off-by: seunghui youn [email protected]
draft : #13486
related : #13282