Skip to content

Commit

Permalink
[doc] Minor improvements of the runtime docs (Samsung#12693)
Browse files Browse the repository at this point in the history
This commit corrects punctuation, typos and style.

ONE-DCO-1.0-Signed-off-by: Piotr Fusik <[email protected]>
  • Loading branch information
pfusik authored Feb 27, 2024
1 parent f35d3dc commit 31c728d
Show file tree
Hide file tree
Showing 6 changed files with 44 additions and 44 deletions.
34 changes: 17 additions & 17 deletions docs/runtime/backend-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@ Backend API is defined by One Runtime. It is about actual computation of operati

## How backends are loaded

When a backend ID is given to a session, the compiler module tries to load `libbackend_{BACKEND_ID}.so`. If it is successful, the runtime looks up for C API functions in it, and make use of those.
When a backend ID is given to a session, the compiler module tries to load `libbackend_{BACKEND_ID}.so`. If it is successful, the runtime looks up for C API functions in it and makes use of those.

## C and C++ API

### C API

We have 2 C API functions which are used as the entrypoint and the exitpoint. Here are the definitions of those.
We have two C API functions which are used as the entrypoint and the exitpoint. Here are the definitions of those.

```c
onert::backend::Backend *onert_backend_create();
Expand All @@ -23,16 +23,16 @@ What they do is creating a C++ object and destroying it, respectively. These two
> **NOTE** C++ API is subject to change so it may change in every release
C API above is just an entrypoint and it delegates core stuff to C++ API.
C API above is just an entrypoint and it delegates core stuff to the C++ API.
Here are major classes are described below. One must implement these classes(and some more classes) to create a backend.
Major classes are described below. One must implement these classes (and some more classes) to create a backend.
- `Backend` : Responsible to create a backend context which is a set of backend components
- `BackendContext` : Holds data for the current session and also responsible to create tensor objects and kernels
- `BackendContext::genTensors` : Create tensor objects
- `BackendContext::genKernels` : Create kernels
- `IConfig` : Configurations and miscellaneous stuff (not session based, global)
- `ITensorRegistry` : A set of tensor(`ITensor`) objects that are used by the current backend
- `Backend` : Responsible for creating a backend context which is a set of backend components
- `BackendContext` : Holds data for the current session and also responsible for creation of tensor objects and kernels
- `BackendContext::genTensors` : Creates tensor objects
- `BackendContext::genKernels` : Creates kernels
- `IConfig` : Configurations and miscellaneous stuff (global, not session based)
- `ITensorRegistry` : A set of tensor (`ITensor`) objects that are used by the current backend
Please refer to each class document for details. You may refer to [Bundle Backends](#bundle-backends) for actual implementation samples.
Expand All @@ -42,24 +42,24 @@ We provide some backends along with the runtime. There is the special backend `b
## `builtin` Backend
`builtin` is a special backend that is always loaded(statically linked, part of runtime core). It is implemented just like other backends, but there are some things that it does exclusively.
`builtin` is a special backend that is always loaded (statically linked, part of runtime core). It is implemented just like other backends, but there are some things that it does exclusively.
- Has kernels for If, While and Permute operations (Kernels from other backends are never be used)
- Has kernels for If, While and Permute operations (Kernels from other backends are never used)
- The runtime core directly creates `builtin`'s tensor objects to accept user-given input and output buffers
- The runtime core gives the executor context to `builtin` backend which allows control flow ops can change execution flow properly
- The runtime core gives the executor a context to `builtin` backend which lets control flow ops properly change the execution flow
## Bundle Backends
Without actual implmentation of backends, we cannot run any models. So we provide 3 bundle backends which support dozens of operations.
Without actual implementation of backends, we cannot run any model. So we provide 3 bundle backends which support dozens of operations.
### cpu
This backend is written in C++ and all the computation is done with CPU only.
This backend is written in C++ and all the computation is done exclusively on a CPU.
### acl_neon
`acl_neon` is a backend that is an adaptation layer of [ARM ComputeLibrary](https://github.com/ARM-software/ComputeLibrary) NE(NEON) part. So it basically only uses CPU too, but worksonly on ARM.
`acl_neon` is a backend that is an adaptation layer of [ARM ComputeLibrary](https://github.com/ARM-software/ComputeLibrary) NE (NEON) part. So it's CPU-only and restricted to ARM.
### acl_cl
`acl_cl` is a backend that is an adaptation layer of [ARM ComputeLibrary](https://github.com/ARM-software/ComputeLibrary) CL(OpenCL) part. OpenCL support(`libOpenCL.so`) is also necessary in the running environment to be able to use this backend. Also, it works only on ARM.
`acl_cl` is a backend that is an adaptation layer of [ARM ComputeLibrary](https://github.com/ARM-software/ComputeLibrary) CL (OpenCL) part. OpenCL support (`libOpenCL.so`) is also necessary in the running environment to be able to use this backend. Also, it works only on ARM.
2 changes: 1 addition & 1 deletion docs/runtime/compute.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,4 @@ The code structure looks just like ComputeLibrary's. Some of the code could be c

## cker

"cker" stands for Cpu KERnel. It is a port of Tensorflow lite's operation kernels and possibly there are some own code. It is used by `cpu` backend.
"cker" stands for Cpu KERnel. It is a port of Tensorflow lite's operation kernels with some additions. It is used by the `cpu` backend.
16 changes: 8 additions & 8 deletions docs/runtime/controlflow-operations.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,22 @@
# Controlflow Operations

We call `If` and `While` operations "Controlflow operations". These operations are different from the others. They are not for computing data, they are used to invoke another subgraph and return back which is to make conditional/iterations work in dataflow models.
We call the `If` and `While` operations "Controlflow operations". These operations are special. Instead of computing data, they are used to invoke another subgraph and return back which constitutes conditional/iterations work in dataflow models.

## Defining controlflow operations

As we use Tensorflow Lite schema(or Circle which is based on TF Lite), the runtime follows the way TF Lite does. The details are stated in [Control Flow in TensorFlow Lite](https://github.com/tensorflow/community/blob/master/rfcs/20190315-tflite-control-flow.md) RFC document.
As we use Tensorflow Lite schema (or Circle which is based on TF Lite), the runtime follows the way TF Lite does. The details are stated in the [Control Flow in TensorFlow Lite](https://github.com/tensorflow/community/blob/master/rfcs/20190315-tflite-control-flow.md) RFC document.

Controlflow operations from NN API is not yet supported. But we expect that it can be enabled in the similar way.
Controlflow operations from NN API are not yet supported. But we expect that they can be enabled in a similar way.

## Implementation

### Graph representation

`onert` internally has its representation for controlflow operations and subgraphs. It is pretty much straightforward as it is pretty much isomorphic with the schema. The `onert`'s in-memory model contains multiple subgraphs and the controlflow operations have same parameters(subgraph indices) just like TF Lite schema has.
`onert` internally has its representation for controlflow operations and subgraphs. It is straightforward as it is pretty much isomorphic with the schema. The `onert`'s in-memory model contains multiple subgraphs and the controlflow operations have same parameters (subgraph indices), just like TF Lite schema has.

### Execution

`controlflow` backend is a built-in backend to support these controlflow operations. This backend is special as it has access to `onert` core's executor manager(`ExecutorMap`) so it can invoke/return a subgraph. This backend has implementation for `If` and `While` operation and they make use of the access to executor manager.
The `controlflow` backend is a built-in backend to support these controlflow operations. This backend is special as it has access to `onert` core's executor manager (`ExecutorMap`) so it can invoke/return a subgraph. This backend has implementations for `If` and `While` operations and they make use of the access to executor manager.

An `Executor` has two different ways to execute depending on if it is the initial execution or invoking a subgraph from a controlflow operation.

Expand All @@ -28,13 +28,13 @@ An `Executor` has two different ways to execute depending on if it is the initia

#### Kernel Implementation

Here is brief explanation what the kernels do, which is quoted from [Control Flow in TensorFlow Lite](https://github.com/tensorflow/community/blob/master/rfcs/20190315-tflite-control-flow.md).
Here is a brief explanation what the kernels do, which is quoted from [Control Flow in TensorFlow Lite](https://github.com/tensorflow/community/blob/master/rfcs/20190315-tflite-control-flow.md).

> * `If` : Check the condition input and invoke one of the 2 subgraphs.
> * `While` :
> * Invoke the condition subgraph. Break out the loop if result is false.
> * Invoke the condition subgraph. Break out the loop if the result is false.
> * Invoke the body subgraph, use the output as the input of the next iteration.
Invoking a subgraph needs to pass the operation's inputs to the subgraph inputs. And Returning back needs to pass the subgraph outputs to the operation outputs.

When invoking a subgraph and returning back, the current kernel implementation performs literally copy all the subgraph inputs and outputs. This is going to be optimized to minimize redundant copies.
When invoking a subgraph and returning back, the current kernel implementation makes a copy of all the subgraph inputs and outputs. This is going to be optimized to minimize redundant copies.
12 changes: 6 additions & 6 deletions docs/runtime/core.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,21 +66,21 @@ Let's say we have some functions written in a certain programming language. Then

#### 5. Create Executor

With generated tensors and kernels, the compiler creates executor objects. There are 3 types of executors are supported - Linear, Dataflow, and Parallel. Linear executor is the default executor and Dataflow Executor and Parallel Executor are experimental.
With generated tensors and kernels, the compiler creates executor objects. There are 3 types of executors: Linear, Dataflow, and Parallel. Linear executor is the default executor and Dataflow Executor and Parallel Executor are experimental.

For more about executors, please refer to [Executors](executors.md) document.
For more about executors, please refer to the [Executors](executors.md) document.

### Module `exec`

`exec` stands for 'execution'. As a result of the compilation, `Execution` class is created. This class manages the actual execution of the model inference. Here is a typical usage of using this class.
`exec` stands for 'execution'. As a result of the compilation, `Execution` class is created. This class manages the actual execution of the model inference. Here is a typical usage of this class.

1. Resize input size if needed
2. Provide input and output buffers
3. Run the inference in either synchronous/asynchronous mode
3. Run the inference in either synchronous or asynchronous mode
4. Check out the results which are stored in output buffers provided earlier

### Module `backend`

Backends are plugins and they are loaded dynamically(via `dlopen`). So this module is a set of interface classes for backend implementation. `compiler` can compile with a variety of backends without knowing specific backend implementation.
Backends are plugins and they are loaded dynamically (via `dlopen`). So this module is a set of interface classes for backend implementation. `compiler` can compile with a variety of backends without knowing specific backend implementation.

Backend interface classes are mostly about memory management and kernel generation. For more, please refer to [Backend API](backend-api.md) document.
Backend interface classes are mostly about memory management and kernel generation. For more, please refer to the [Backend API](backend-api.md) document.
14 changes: 7 additions & 7 deletions docs/runtime/executors.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Executors

Executor(`IExecutor`) is a execution engine of a subgraph that can execute inference for the subgraph. It is the result of `Subgraph` compilation. If we would compare it with most of common programming language tools, it is just like an interpreter with code to execute.
Executor (`IExecutor`) is an execution engine of a subgraph that can execute inference for the subgraph. It is the result of a `Subgraph` compilation. Compared to common programming language tools, it is like an interpreter with code to execute.

## Understanding models

We can think of a NNPackage model as a set of tasks with dependencies. In other words, it is a form of [DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)(more precisely, it is a set of DAGs as we need multiple subgraphs to support control flow operations). And that is exactly the same concept with [Dataflow programming](https://en.wikipedia.org/wiki/Dataflow_programming).
We can think of an NNPackage model as a set of tasks with dependencies. In other words, it is a form of [DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph) (more precisely, it is a set of DAGs, as we need multiple subgraphs to support control flow operations). And that is exactly the same concept with [Dataflow programming](https://en.wikipedia.org/wiki/Dataflow_programming).

That is, there are some input tensors that must be ready to run a operation. And the execution must be done in topological order. Here's the workflow for execution.

Expand All @@ -15,20 +15,20 @@ That is, there are some input tensors that must be ready to run a operation. And
5. Check if there are some operations ready
1. If yes, Go to 3
2. Otherwise, Finish execution
6. User uses data of model output tensors
6. User consumes data of model output tensors

We have 3 different types of executors in our codebase and they all are based on the explation above. However only `LinearExecutor` is official and the other two are experimental.
We have 3 different types of executors in our codebase and they all are based on the above explanation. However, only `LinearExecutor` is official and the other two are experimental.

## Linear Executor

`LinearExecutor` is the main executor. As we know the model to run and the model graph does not change at runtime, we do not have to do step 3-5 above at runtime. During the compilation for Linear Executor, it sorts operations in topological order so we can just execute in that fixed order which means that it cannot perform the operations in parallel.
`LinearExecutor` is the main executor. As we know the model to run and the model graph does not change at runtime, we do not need to do the above steps 3-5 at runtime. During the compilation for Linear Executor, it sorts operations in topological order so we can just execute in that fixed order which means that it cannot perform the operations in parallel.

If the tensors are static, it also can analyze the lifetimes of the tensors and pre-allocate tensor memory with reusing memory between the tensors whose lifetimes do not overlap.

## Dataflow Executor (experimental)

Unlike `LinearExecutor`, `DataflowExecutor` does step 3-5 at runtime. By doing it we can know which operations are available at a specific point. However this executor still executes the operations one at a time. Just choose any operation that is ready then execute. And wait for it to finish then repeat that. So there may be no advantage compared to `LinearExecutor` but `DataflowExecutor` is the parent class of `ParallelExecutor`. And `DataflowExecutor` can be used for profiling executions for the heterogeneous scheduler.
Unlike `LinearExecutor`, `DataflowExecutor` does steps 3-5 at runtime. By doing it we can know which operations are available at a specific point. However this executor still executes the operations one at a time. Just choose any operation that is ready then execute, wait for it to finish then repeat. So there may be no advantage compared to `LinearExecutor` but `DataflowExecutor` is the parent class of `ParallelExecutor`. And `DataflowExecutor` can be used for profiling executions for the heterogeneous scheduler.

## Parallel Executor (experimental)

Just like `DataflowExecutor`, `ParallelExecutor` does step 3-5 at runtime. One big difference is that it creates a `ThreadPool` for each backend for parallel execution(`ThreadPool` is supposed to have multiple threads, however for now, it can have only one thread). As we know that there may be multiple operations ready to execute, and those can be executed in different backends at the same time which could lead some performance gain.
Just like `DataflowExecutor`, `ParallelExecutor` does steps 3-5 at runtime. One big difference is that it creates a `ThreadPool` for each backend for parallel execution (`ThreadPool` is supposed to have multiple threads, however for now, it can have only one thread). Multiple operations ready to execute can be executed in different backends at the same time, which could lead to some performance gain.
Loading

0 comments on commit 31c728d

Please sign in to comment.