Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[onert-micro] Introduce OMTraining entities #13145

Merged

Conversation

BalyshevArtem
Copy link
Contributor

This pr introduces OMTrainingInterpreter, OMTrainingContext and OMTrainingRuntimeModule entities.

for issue #12873
from draft: #13107

ONE-DCO-1.0-Signed-off-by: Artem Balyshev a.balyshev@samsung.com

This pr introduces OMTrainingInterpreter, OMTrainingContext and OMTrainingRuntimeModule entities.

ONE-DCO-1.0-Signed-off-by: Artem Balyshev <a.balyshev@samsung.com>
uint32_t num_of_train_layers = 0;
OMTrainOptimizer optimizer = SGD;
OMLoss loss = MSE;
float learning_rate = 0.f;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about default value other than 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For learning_rate, right? Yes, I agree, I will use 0.001

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// Close file
out_file.close();
#else
assert(fasle && "Not supported");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(fasle && "Not supported");
assert(false && "Not supported");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, fixed it

{
OMStatus status = Ok;
uint32_t batch_size = config.training_context.batch_size;
config.training_context.num_step += 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only problem, where to reset this value to 0. For example we now start new epoch and that is why we need to reset this value again to 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, this num_step is used only for ADAM. Then it does not need to be reset at new epoch.

/*
* OMMetrics - enum to store metrics supported by training with onert-micro
*/
enum OMMetrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BalyshevArtem Note that nnfw api does not expose Metric Evaluation. That is, nnfw api will expose loss evaluation function.

chunseoklee
chunseoklee previously approved these changes Jun 11, 2024
Copy link
Contributor

@chunseoklee chunseoklee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you

Copy link
Contributor

@Torrero Torrero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

return UnknownError;

// Write data
out_file.write(config.model_ptr, config.model_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be better to catch and to process some exceptions of the write function and in case of arising ones to return Error status (the same can be apply for saveCheckpoint). Just check ios_base::badbit state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IN next pr will add this, thank you

@BalyshevArtem BalyshevArtem merged commit a3dbe82 into Samsung:master Jun 11, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants