Add forward to training servicer #227

thodkatz · 2024-12-12T00:01:30Z

It builds on top of the #225

I have implemented the forward method. The forward method intervenes on the training loop, and we retain its initial state. The forward method requires for the training to pause, so we are sure that we won't cause any memory issues, if we attempt to do at the same time training and inference.

codecov · 2024-12-20T21:22:25Z

Codecov Report

Attention: Patch coverage is 33.90805% with 115 lines in your changes missing coverage. Please review.

Project coverage is 63.67%. Comparing base (57df547) to head (07bc100).

Files with missing lines	Patch %	Lines
tiktorch/trainer.py	21.66%	47 Missing ⚠️
tiktorch/proto/training_pb2.py	4.34%	22 Missing ⚠️
tiktorch/proto/inference_pb2.py	5.00%	19 Missing ⚠️
tiktorch/proto/utils_pb2.py	5.26%	18 Missing ⚠️
tiktorch/server/session/backend/supervisor.py	0.00%	7 Missing ⚠️
tiktorch/server/grpc/training_servicer.py	90.90%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #227      +/-   ##
==========================================
- Coverage   64.48%   63.67%   -0.81%     
==========================================
  Files          47       47              
  Lines        2689     2742      +53     
==========================================
+ Hits         1734     1746      +12     
- Misses        955      996      +41

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

If the training is running or paused, the forward, will retain the state after completion. But it requires to pause so we can release memory and do the forward pass.

Since both inference and training servicers have common the concept of id, the training session id was replaced with the model session one used for inference. This model session protobuf interfaced moved to a separate utils proto file. The PredictRequest being common, can be leveraged for abstraction.

k-dominik

Nice, focused PR with pretty straight forward to follow changes including tests! It's great that you unified get_model_session, ModelsessionID, PredictResponse/Request between services. Very cool that the training state is retained for forward happening during training.

You've decided to pull in pytorch already at servicer level, which sprinkles pytorch through all the layers. I would prefer to keep tensors numpy until reaching the actual backend to at least leave the option to change the framework to something else (e.g. keras) in the future. The Inference Servicer uses the bioimageio Sample abstraction - would this be an option here, as well?

tiktorch/trainer.py

k-dominik · 2025-01-16T14:59:49Z

tests/test_server/test_grpc/test_training_servicer.py

+        assert len(predicted_tensors) == 1
+        predicted_tensor = predicted_tensors[0]
+        assert predicted_tensor.dims == ("b", "c", "z", "y", "x")
+        assert predicted_tensor.shape == (batch, out_channels_unet2d, 1, 128, 128)


It would be nice if predicted_tensor values could also be tested somehow, just to be on a little bit of a safer side...

You are right. Currently with the configuration, we can't really mock the model. We need to bypass the initialization, and use the testing approach of tests such as the test_start_training_success, where we create a mocked trainer object MockedNominalTrainer, that could set a mocked model as attribute.

That was a general concern that I had, that currently a few tests, they use the pytorch 3d unet config to create a model, but we don't really have a testing controlled mocked model.

Bypassing the init phase, of the configuration, we lose the end to end approach of the test, but maybe we should somehow have both.

but this is in fact doing a proper forward pass?

currently it is doing a forward pass by a model defined by the config, I am not sure what do you refer to this

tests/test_server/test_grpc/test_training_servicer.py

thodkatz · 2025-01-20T00:26:19Z

Thanks a lot @k-dominik for the review! I have implemented the suggestion of being decoupled by pytorch tensors, and use the sample bioimageio abstraction :)

I just have one concern regarding testing, could you please have a look on this comment #227 (comment)

k-dominik · 2025-01-20T10:17:39Z

Hi Theo, conda-build changed their default package format... now the extension is .conda. So to make the tests pass again, you need to change .tar.bz2 to .conda in here: https://github.com/thodkatz/tiktorch/blob/07bc1003c61c47cf7dce163b5108fe09371ad369/.github/workflows/ci.yml#L121

k-dominik

This PR looks good now @thodkatz (except for the previous comment about conda-build's change package format), can be merged once tests pass :)

thodkatz force-pushed the add-forward-to-training-servicer branch 3 times, most recently from d482661 to 32cd26d Compare December 12, 2024 23:52

thodkatz mentioned this pull request Dec 12, 2024

Add save and export to training servicer #228

Merged

2 tasks

thodkatz force-pushed the add-forward-to-training-servicer branch 3 times, most recently from 3dc2864 to 7744f91 Compare December 20, 2024 21:19

thodkatz force-pushed the add-forward-to-training-servicer branch from 7744f91 to 50b0944 Compare December 20, 2024 21:27

thodkatz added 4 commits January 16, 2025 11:05

Add forward action to the training service

c47889b

Retain the initial state of training when forward

ff21368

If the training is running or paused, the forward, will retain the state after completion. But it requires to pause so we can release memory and do the forward pass.

Add preprocessing and postprocessing to forward method

941db36

thodkatz force-pushed the add-forward-to-training-servicer branch from 4798dbe to d3d6702 Compare January 16, 2025 10:11

Remove redundant wrapping of session id in tests

bb8ed4f

k-dominik reviewed Jan 16, 2025

View reviewed changes

Use bioimageio sample abstraction instead of pytorch tensor

07bc100

k-dominik approved these changes Jan 20, 2025

View reviewed changes

Change tiktorch package name to .conda

f787cc6

thodkatz merged commit 90b4ce0 into ilastik:main Jan 20, 2025
6 checks passed

thodkatz mentioned this pull request Jan 20, 2025

mock pytorch-3dunet model's forward method #232

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add forward to training servicer #227

Add forward to training servicer #227

thodkatz commented Dec 12, 2024 •

edited

Loading

codecov bot commented Dec 20, 2024 •

edited

Loading

k-dominik left a comment

k-dominik Jan 16, 2025

thodkatz Jan 20, 2025 •

edited

Loading

k-dominik Jan 20, 2025

thodkatz Jan 20, 2025

thodkatz commented Jan 20, 2025 •

edited

Loading

k-dominik commented Jan 20, 2025

k-dominik left a comment •

edited

Loading

Add forward to training servicer #227

Add forward to training servicer #227

Conversation

thodkatz commented Dec 12, 2024 • edited Loading

codecov bot commented Dec 20, 2024 • edited Loading

Codecov Report

k-dominik left a comment

Choose a reason for hiding this comment

k-dominik Jan 16, 2025

Choose a reason for hiding this comment

thodkatz Jan 20, 2025 • edited Loading

Choose a reason for hiding this comment

k-dominik Jan 20, 2025

Choose a reason for hiding this comment

thodkatz Jan 20, 2025

Choose a reason for hiding this comment

thodkatz commented Jan 20, 2025 • edited Loading

k-dominik commented Jan 20, 2025

k-dominik left a comment • edited Loading

Choose a reason for hiding this comment

thodkatz commented Dec 12, 2024 •

edited

Loading

codecov bot commented Dec 20, 2024 •

edited

Loading

thodkatz Jan 20, 2025 •

edited

Loading

thodkatz commented Jan 20, 2025 •

edited

Loading

k-dominik left a comment •

edited

Loading