Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[onert] Introduce ExtraTensorRequest #13604

Closed
wants to merge 2 commits into from

Conversation

zetwhite
Copy link
Contributor

@zetwhite zetwhite commented Aug 7, 2024

This PR introduces ExtraTensorRequest.
Through this class, each TrainableFunction can request extra tensors to be pre-allocated.

ONE-DCO-1.0-Signed-off-by: seunghui youn [email protected]

draft : #13486
related : #13282

@zetwhite zetwhite marked this pull request as ready for review August 7, 2024 04:04
@zetwhite zetwhite requested a review from a team August 7, 2024 04:04
@zetwhite zetwhite added approval: 2 Require at least 2 approvals PR/ready for review It is ready to review. Please review it. labels Aug 7, 2024
@zetwhite zetwhite force-pushed the 0807/etensor-requests branch from 1294475 to b008152 Compare August 7, 2024 04:09
This PR introduces ExtraTensorRequest.
Through this class, each TrainableFunction can request extra tensors to be pre-allocated.

ONE-DCO-1.0-Signed-off-by: seunghui youn <[email protected]>
@zetwhite zetwhite force-pushed the 0807/etensor-requests branch from b008152 to 0182ae2 Compare August 7, 2024 04:10
Comment on lines +52 to +53
const ir::OperandInfo &info() const { return _info; }
ExtraTensorLifeTime lifetime() const { return _lifetime; }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally add only getters for these fields.
Once ExtraTensor's info and lifetime are set, there is no need to be changed.

{
BACKWARD, // alive during backward()
FORWARD_TO_BACKWARD, // alive from forward to backward()
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first of all, I may be asking the wrong question due to my lack of understanding of this function, so please understand.

what does lifetime mean?
is it dynamically allocating/deallocating memory at the start/end of the section(backward, forward_to_backward)?
and, what does "Extra" mean in ExtraTensor?

Copy link
Contributor Author

@zetwhite zetwhite Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first of all, I may be asking the wrong question due to my lack of understanding of this function, so please understand.

Thanks for your review and interest! I'm glad that I could explain it.

what does "Extra" mean in ExtraTensor?

ExtraTensor means a tensor additionally requested while doing backward(backpropagte).

For example, while backwarding FullyConnected, we need to calculate X^T and W^T.
Since X^T and W^T are 'additionaly' needed because of mathmetical properties of FC, I used the word 'Extra' to mark it additional tensor related to the specific layer.

Copy link
Contributor Author

@zetwhite zetwhite Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does lifetime mean?

Lifetime means the time that additional tensors must remain alive.

In the example above, W^T and X^T are used only in backwarding of FC layer.
So, the lifetime of these tensors are limited to backward(). (mark it BACKWARD)

Some ExtraTensors have to be alive while both forwarding and backwarding.
So, I'd like to mark it as FORWARD_TO_BACKWARD.

is it dynamically allocating/deallocating memory at the start/end of the section(backward, forward_to_backward)?

With lifetime information, we can know in advance when and how much tensor will be needed.
So My plan is to (statically) pre-allocate maximum memory before the training process starts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that my explanation is somehow verbose and a bit hard to understand 😢
If you're more interested I could explain the details offline!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.

When I suggested LayerScopeTensor in #13605 (comment) based on your comment

ExtraTensor is a tensor that is accessed within one operation layer.
In other words, the scope of the extra tensor is confined to one specific layer.

But it seems misleading to use the scope is one specific layer.
It may be accessed in the specific layer only, (but it is true for all other tensors) but its lifetime is longer than the specific layer.

I expected it is like a local variable, which is released when it is out of the scope.

2.

Why do we need ExtraTensor to get the scope of a Tensor?
Is it possible to use use-def chain to get the scope of a Tensor?
It would be better not to introduce yet another tensor if possible.

Copy link
Contributor Author

@zetwhite zetwhite Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About 2.

Is it possible to use use-def chain to get the scope of a Tensor?

As I understood, using the use-def chain is not easy in this problem. (at least for now)

Current use-def chain has below structure.

// UseDefChain.h

class UseDefChain
{
	const Operand &_operand; 
	std::set<TrainingOperationIndex> _uses;
	std::set<TrainingOperationIndex> _defs;
};

This assumes that _operand (tensor) is defined and used by operations in the graph.

Since ExtraTensors are definition and usage is dependent on each layer's implementation,
It is NOT shown in the graph. So, it is hard to express ExtraTensor's use-def with current use-def chains.

Copy link
Contributor Author

@zetwhite zetwhite Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

about 1.

But it seems misleading to use the scope is one specific layer.

I thought the word 'scope' meant 'where the tensor can be accessed' and it doesn't imply the lifetime.
So for me, the word 'scope' looks proper in this case.

It may be accessed in the specific layer only, (but it is true for all other tensors)

Aha, I agree with your point. About the weight tensors, it is accessed by one specific layer.
I need to re-consider the Tensor naming and its property carefully.

@zetwhite zetwhite requested a review from ys44kim August 7, 2024 05:49
{

public:
ExtraTensorRequest(ir::OperandInfo info, ExtraTensorLifeTime lt, ExtraTensor **addr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ExtraTensorRequest(ir::OperandInfo info, ExtraTensorLifeTime lt, ExtraTensor **addr)
ExtraTensorRequest(const ir::OperandInfo &info, ExtraTensorLifeTime lt, ExtraTensor **addr)

About ExtraTensor **addr, can't it be implemented without using a double pointer? If you want to use double pointer, please add const to the variable properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, Thank you for your notice. I'll try to find another way 🤔

Copy link
Contributor Author

@zetwhite zetwhite Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. I rethink about this.

Easy way is ..

  • use ExtraTensor** const
  • use ExtraTensor*&

Otherwise, shared_ptr<ExtraTensor>& looks also possible.

About adding const, ExtraTensor** just becomes ExtraTensor** const.

  • ExtraTensor should be mutable in the view of each Layer.
  • ExtraTensor* should be mutable, this has to be updated after the extra tensor is registered
  • ExtraTensor** can be constant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jyoungyun Could you share your opinion?

Do you think ExtraTensor*& is safe enough? or should we have to find another way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zetwhite I prefer to use smart pointer instead of raw pointer. The smart pointers do automatic memory management and it lessen tha chances of common pointer-related errors, enhancing code reliability. If you implement this code using shared_ptr, how about trying to use it?

Copy link
Contributor

@ragmani ragmani Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ragmani Could you share your opinion about this? Is it fine to add shared_ptr in TensorRegistry?

I couldn't find a way to solve this issue better than using shared_ptr or double pointer. And I think it's better to use shared_ptr& instead of double raw pointer as https://github.com/Samsung/ONE/pull/13604/files#r1708816555.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ragmani @jyoungyun I'll update to use shared_ptr and re-request reviews soon. Thank you for help!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, double pointer was there since it is output parameter. Is the ownership is shared? If not, we don't need to use shared_ptr.

In fact, I am not sure ExtraTensor is necessary.

Copy link
Contributor

@ragmani ragmani Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, double pointer was there since it is output parameter. Is the ownership is shared? If not, we don't need to use shared_ptr.

The ownership is shared with the TensorRegistry if each layer has the ownership.

In fact, I am not sure ExtraTensor is necessary.

If managing and planning ExtraTensors are the same as other tensor types, those tensor types can be unified. But I'm still not sure if that's the case yet. I think there are DisposableTensor and GradientTensor as candidates, but ExtraTensor may be required in both forwarding and backwarding nodes in some cases.

Copy link
Contributor Author

@zetwhite zetwhite Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the ownership is shared?

About this.. IMO there is no concrete answer :

  • Sbd could see that only TensorRegistry takes ownership and Layer borrows(not owns) the tensors.
  • Otherwise, Sbd could see that both TensorRegistry and Layer own the tensors.

In fact, I am not sure ExtraTensor is necessary.

I thought that somehow we need to manage the (Extra or LayerScope) tensors in core to plan the memory usage. So, I tried the way define the tensor as 'extra' and manage it through TensorManager.

Could you share your opinion( or what you are concerned) about this?
Since this work is not in a hurry, I would like to reflect your opinion.

(+) Ah, I could find a details in here #13604 (comment)

@jyoungyun jyoungyun requested a review from a team August 8, 2024 01:59
private:
ir::OperandInfo _info;
ExtraTensorLifeTime _lifetime;
backend::train::ExtraTensor **_address;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
backend::train::ExtraTensor **_address;
std::shared_ptr<backend::train::ExtraTensor> &_address;

@zetwhite zetwhite added PR/NO MERGE Please don't merge. I'm still working on this :) and removed PR/ready for review It is ready to review. Please review it. labels Aug 9, 2024
@zetwhite
Copy link
Contributor Author

#13604 (comment)

Using shared_ptr, ExtraTensorRequest is no more necessary.

Because Each Layer could generate its own tensor and share the shared_ptr<ExtraTensor> with the core (TensorRegistry).

So, I'll close this PR and move to next PR.

@zetwhite zetwhite closed this Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approval: 2 Require at least 2 approvals PR/NO MERGE Please don't merge. I'm still working on this :)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants