Shared tensor mechanism for avoiding tensor serialization/ipc costs #58

jsjason · 2019-07-13T17:38:41Z

Closes #57.

This PR implements the solution described in #57, using shared tensors and integer signals to pass intermediate tensor values instead of directly serializing tensors and sending them through queues (which are linux pipes, internally).

First, the main benchmark process examines the given pipeline and creates all shared tensors in advance. Each runner process is given a certain amount of output tensors to write their outputs. The device of each output tensor is set to be equal to their corresponding processes, and the shape is assumed to be given by the model implementation (RunnerModel.output_shape()). Note that all output tensors for step N are given to all processes of step N+1 as input tensors. The newly added class BenchmarkTensors acts as an abstraction for managing such shared tensors.
The number of output tensors per process can be configured via the new command line argument -t / --tensors_per_process, which has been added to the log directory format as well.

Now, the information that needs to be passed through queues is an index indicating "which shared tensor is ready." I've implemented the index simply as a tuple of two integers: one for denoting the process index (there can be multiple processes per step), and one for denoting the tensor index (out of tensors_per_process tensors). Each process writes its outputs to one of its tensors_per_process output tensors and sends the correct index afterwards. In case all tensors_per_process tensors have been used up, it goes back to the first tensor and repeats the whole procedure, sort of like a ring buffer. To ensure that the process does not write to an output tensor that has not yet been consumed by a process from the next step, I've added a multiprocessing.Event object to every shared tensor so that consumers can notify waiting producers. Check the comments in the new TensorEvent class and the usages in runner.py for more details.

Test:

$ cat config/r2p1d-whole.json
...
    {
      "model": "models.r2p1d.model.R2P1DRunner",
      "gpus": [0,1,2,3,4,5,6,7],
...
$ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python benchmark.py -c config/r2p1d-whole.json -v 2000 -mi 25

Before this PR, the average time taken between inference0_finish and runner1_start (the ipc part) was 13.010609 ms. After applying this PR, the same section takes only 2.661710 ms.

yunseong

Thank you so much for the PR with an amazing speedup!
The description is so kind and the code is pretty clean.

I've left a few minor comments. Please check them out!

control.py

yunseong · 2019-07-14T13:23:57Z

runner_model.py

@@ -17,6 +17,11 @@ def input_shape(self):
    """Returns the expected shape of the input tensor to this model."""
    raise NotImplementedError

+  @staticmethod
+  def output_shape():
+    """Returns the expected shape of the output tensor of this model."""


As we discussed previously, could we add the description for the case of None in the input/output shape (i.e., not a Tensor at the dimension)?

The change should be applied to input_shape() as well.

runner.py

control.py

jsjason

@yunseong I've addressed your comments, thank you very much! Could you take another look? I've added quite a few changes in both code and comments, regarding the tensor/non-tensor generalization.

control.py

runner.py

jsjason · 2019-07-15T10:31:44Z

runner_model.py

@@ -17,6 +17,11 @@ def input_shape(self):
    """Returns the expected shape of the input tensor to this model."""
    raise NotImplementedError

+  @staticmethod
+  def output_shape():
+    """Returns the expected shape of the output tensor of this model."""


runner.py

jsjason · 2019-07-15T12:14:24Z

Sorry for the extra commits, I left out a few details. I'm really ready for a review now.
https://github.com/snuspl/rnb/pull/58/files/21a479db230597fbd0549e02718c3028bb140ddf..c600b2b3d3c94908a8d22c0a562cff9a866cf0e1

runner.py

yunseong

Looks great to me. Thank you so much for fixing this issue.
I'm merging this.

implement shared tensor mechanism

3314bdd

jsjason requested a review from yunseong July 13, 2019 17:38

undo local changes

21a479d

yunseong suggested changes Jul 14, 2019

View reviewed changes

jsjason mentioned this pull request Jul 15, 2019

Do not block write when a released shared output tensor exists #59

Open

address comments from #58

109749b

jsjason commented Jul 15, 2019

View reviewed changes

jsjason added 2 commits July 15, 2019 12:03

undo local changes

0b932e5

minor commment fixes

c600b2b

yunseong reviewed Jul 15, 2019

View reviewed changes

runner.py Show resolved Hide resolved

yunseong approved these changes Jul 16, 2019

View reviewed changes

yunseong merged commit 5d84749 into master Jul 16, 2019

yunseong deleted the js-signal branch July 16, 2019 01:59

This was referenced Jul 16, 2019

Allow input/output of a Runner other than Tensor #41

Closed

Hotfix: Fix overwriting a variable in runner #60

Merged

jsjason mentioned this pull request Aug 28, 2019

Return correct output shape for R(2+1)D #69

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared tensor mechanism for avoiding tensor serialization/ipc costs #58

Shared tensor mechanism for avoiding tensor serialization/ipc costs #58

jsjason commented Jul 13, 2019 •

edited

Loading

yunseong left a comment

yunseong Jul 14, 2019

yunseong Jul 14, 2019

jsjason Jul 15, 2019

jsjason left a comment

jsjason Jul 15, 2019

jsjason commented Jul 15, 2019

yunseong left a comment

Shared tensor mechanism for avoiding tensor serialization/ipc costs #58

Shared tensor mechanism for avoiding tensor serialization/ipc costs #58

Conversation

jsjason commented Jul 13, 2019 • edited Loading

yunseong left a comment

Choose a reason for hiding this comment

yunseong Jul 14, 2019

Choose a reason for hiding this comment

yunseong Jul 14, 2019

Choose a reason for hiding this comment

jsjason Jul 15, 2019

Choose a reason for hiding this comment

jsjason left a comment

Choose a reason for hiding this comment

jsjason Jul 15, 2019

Choose a reason for hiding this comment

jsjason commented Jul 15, 2019

yunseong left a comment

Choose a reason for hiding this comment

jsjason commented Jul 13, 2019 •

edited

Loading