[DRAFT][onert-micro] Onert-micro training PoCv3 #13107

BalyshevArtem · 2024-06-04T11:49:21Z

This draft introduces third version of the onert-micro training runtime. This is for one-stage training (without generating backprop graph part) and without weight divider.

for issue: #12873

ONE-DCO-1.0-Signed-off-by: Artem Balyshev [email protected]

chunseoklee · 2024-06-04T15:09:30Z

onert-micro/eval-driver/TrainingDriver.cpp

-
-  printf("MAE_ERROR TEST = %f\n", mae_result);
+  // Save training result
+  saveModel(output_trained_file_path, circle_model);


This may be exposed via API(

ONE/runtime/onert/api/nnfw/include/nnfw_experimental.h

Line 347 in 1b14735

NNFW_STATUS nnfw_train_export_circle(nnfw_session *session, const char *path);

). Thus, circle_model or model_ptr should be kept in some context.

So, move this function insert OMTrainingInterpreter, right?

chunseoklee · 2024-06-04T16:11:08Z

onert-micro/eval-driver/TrainingDriver.cpp

-    auto data = reinterpret_cast<float *>(interpreter.readOutputTensor(0));
+  // Temporary buffer to read input data from file using BATCH_SIZE
+  float training_input[BATCH_SIZE * INPUT_SIZE];
+  float training_target[BATCH_SIZE * OUTPUT_SIZE];


label data for CCE is one hot encoding. You will feed int label data by converting into float ?

Yes, I think it is one of the preprocessing step. And we have (for cross-etropy task) as input already preprocessed one-hot encoded float data. Do you think it is better to move it into runtime?

Torrero · 2024-06-05T11:08:15Z

onert-micro/onert-micro/src/core/OMTrainingRuntimeModule.cpp

+  // Averaged result
+  {
+    float *f_metric_val = reinterpret_cast<float *>(metric_val);
+    *f_metric_val /= test_size;


Is it reasonable to insert assert for checking test_size not equal zero?

Yes, thank you, I will add assert here

Torrero · 2024-06-05T12:02:34Z

onert-micro/onert-micro/src/train/metrics/MSE.cpp

+
+  for (uint32_t i = 0; i < flat_size; ++i)
+  {
+    result_value += std::pow((calculated_data[i] - target_data[i]), 2);


Here std::pow() can be replace by multiplication

BalyshevArtem · 2024-06-06T17:04:58Z

@chunseoklee, I added checkpoints saving and loading api:

ONE/onert-micro/eval-driver/TrainingDriver.cpp

Line 318 in ea703dd

train_interpreter.saveCheckpoint(config, checkpoints_path);

ONE/onert-micro/eval-driver/TrainingDriver.cpp

Line 326 in ea703dd

train_interpreter.loadCheckpoint(config, checkpoints_path);

It it similar to #12997 (comment)

BalyshevArtem · 2024-06-06T17:22:54Z

@chunseoklee, I added checkpoints saving and loading api:

ONE/onert-micro/eval-driver/TrainingDriver.cpp

Line 318 in ea703dd

train_interpreter.saveCheckpoint(config, checkpoints_path);

ONE/onert-micro/eval-driver/TrainingDriver.cpp

Line 326 in ea703dd

train_interpreter.loadCheckpoint(config, checkpoints_path);

It it similar to #12997 (comment)

To check format of the checkpoint file please see #13037 (comment)

chunseoklee · 2024-06-07T04:52:18Z

onert-micro/eval-driver/TrainingDriver.cpp

+      cur_batch_size = std::max(1u, cur_batch_size);
+
+      config.training_context.batch_size = cur_batch_size;
+      config.training_context.num_step = i + 1;


AFAIU, num_step for ADAM optimizer should not be reset by each epoch.

Suggested change

config.training_context.num_step = i + 1;

config.training_context.num_step++;

or

Suggested change

config.training_context.num_step = i + 1;

config.training_context.adam_step++;

I see, I will change it

BalyshevArtem · 2024-06-07T08:21:00Z

@chunseoklee, can I start split this draft ad merging it?

chunseoklee · 2024-06-07T08:49:10Z

@chunseoklee, can I start split this draft ad merging it?

Sure.

chunseoklee · 2024-06-07T11:10:11Z

@BalyshevArtem I am trying to make onert-micro-dev module, which implement nnfw api on https://github.com/chunseoklee/ONE/commits/v3/. After drafting this, I am going to apply this to your TizenRT internal commit.

chunseoklee · 2024-06-08T17:41:44Z

onert-micro/onert-micro/src/core/OMTrainingRuntimeModule.cpp

+ *  Warning: before using trainSingleStep call: 1) importTrainModel; 2) setInput; 3) setTarget
+ */
+OMStatus OMTrainingRuntimeModule::trainSingleStep(const OMConfig &config)
+{


Suggested change

{

{

config.training_context.num_step++;

we need this for update num_step for ADAM.

Yes, and to save this value in checkpoints files (if needed method will call)

I mean we need somewhere to keep this num_step value

chunseoklee · 2024-06-10T11:01:18Z

onert-micro/onert-micro/include/OMConfig.h

+ * num_of_train_layers - number of trainable last layers (Note: 0 - all layers will be trained)
+ * optimizer - optimizer which onert-micro training will be used (Note: SGD - default one)
+ * loss - loss which onert-micro training will be used (Note: CROSS_ENTROPY - default one)
+ * lambda - used by all optimizers


Is this learning rate ? then how about using lr or learning_rate ?

BalyshevArtem · 2024-06-11T15:28:01Z

onert-micro/onert-micro/src/train/tests/BostonHousingTask.test.cpp

+TEST_F(BostonHousingTaskTest, ADAM_MSE_P)
+{
+  // Create BostonHousing data handler
+  BostonHousingTask<float> bostonTask;
+


Added tests

This draft introduces second version of the onert-micro training runtime. This is for one-stage training (without generating backprop graph part). Also this draft introduces weight divider tool. ONE-DCO-1.0-Signed-off-by: Artem Balyshev <[email protected]>

BalyshevArtem · 2024-06-20T14:12:55Z

Everything is merged, close it

BalyshevArtem requested review from Torrero, chunseoklee and SlavikMIPT June 4, 2024 11:49

BalyshevArtem mentioned this pull request Jun 4, 2024

[DRAFT] Onert-micro training PoCv2 #13072

Closed

BalyshevArtem changed the title ~~[DRAFT][onert-micro] Onert-micro training PoCv2~~ [DRAFT][onert-micro] Onert-micro training PoCv3 Jun 4, 2024

chunseoklee reviewed Jun 4, 2024

View reviewed changes

Torrero reviewed Jun 5, 2024

View reviewed changes

chunseoklee reviewed Jun 7, 2024

View reviewed changes

chunseoklee mentioned this pull request Jun 7, 2024

onert-micro training api #12996

Draft

chunseoklee reviewed Jun 8, 2024

View reviewed changes

chunseoklee reviewed Jun 10, 2024

View reviewed changes

BalyshevArtem force-pushed the onert_micro_training_poc_v3 branch from ea703dd to 97976ac Compare June 11, 2024 15:26

BalyshevArtem commented Jun 11, 2024

View reviewed changes

Artem Balyshev added 5 commits June 17, 2024 12:38

add usage without wof file and add save method to Driver.

090def3

remove circle divider

8bb6635

change Adam optimizer, and fix some small details

8e1774a

move saving model into runtime api

99ae4b4

Artem Balyshev added 3 commits June 17, 2024 12:39

add checkpoints and small fix for adam optimizer

39c479e

add tests

01d7bae

rebase fixes

130afb4

BalyshevArtem force-pushed the onert_micro_training_poc_v3 branch from 97976ac to 130afb4 Compare June 17, 2024 09:49

BalyshevArtem closed this Jun 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT][onert-micro] Onert-micro training PoCv3 #13107

[DRAFT][onert-micro] Onert-micro training PoCv3 #13107

BalyshevArtem commented Jun 4, 2024

chunseoklee Jun 4, 2024 •

edited

Loading

BalyshevArtem Jun 5, 2024

chunseoklee Jun 4, 2024

BalyshevArtem Jun 5, 2024

Torrero Jun 5, 2024 •

edited

Loading

BalyshevArtem Jun 5, 2024

Torrero Jun 5, 2024

BalyshevArtem commented Jun 6, 2024

BalyshevArtem commented Jun 6, 2024

chunseoklee Jun 7, 2024

chunseoklee Jun 7, 2024

BalyshevArtem Jun 7, 2024

BalyshevArtem commented Jun 7, 2024

chunseoklee commented Jun 7, 2024

chunseoklee commented Jun 7, 2024

chunseoklee Jun 8, 2024

BalyshevArtem Jun 10, 2024

chunseoklee Jun 10, 2024

chunseoklee Jun 10, 2024

BalyshevArtem Jun 10, 2024

BalyshevArtem Jun 11, 2024

BalyshevArtem commented Jun 20, 2024

	config.training_context.num_step = i + 1;
	config.training_context.num_step++;

	config.training_context.num_step = i + 1;
	config.training_context.adam_step++;

[DRAFT][onert-micro] Onert-micro training PoCv3 #13107

[DRAFT][onert-micro] Onert-micro training PoCv3 #13107

Conversation

BalyshevArtem commented Jun 4, 2024

chunseoklee Jun 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Torrero Jun 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BalyshevArtem commented Jun 6, 2024

BalyshevArtem commented Jun 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BalyshevArtem commented Jun 7, 2024

chunseoklee commented Jun 7, 2024

chunseoklee commented Jun 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BalyshevArtem commented Jun 20, 2024

chunseoklee Jun 4, 2024 •

edited

Loading

Torrero Jun 5, 2024 •

edited

Loading