Skip to content

Commit

Permalink
fix(docs): Update docs for scheduler (#6013)
Browse files Browse the repository at this point in the history
* add control plane and data plane

* add note about controller

* create subsection for each micro service

* add note about sync process

* Add more description about core 2 services.

* Grammar, structure, wording edits

---------

Co-authored-by: Paul Bridi <[email protected]>
  • Loading branch information
sakoush and paulb-seldon authored Oct 29, 2024
1 parent 8f83be0 commit 294b5f8
Showing 1 changed file with 42 additions and 16 deletions.
58 changes: 42 additions & 16 deletions docs-gb/architecture/README.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,62 @@
# Architecture

The current set of components used in Seldon Core 2 is shown below:
Seldon Core 2 uses a microservice architecture where each service has limited and well-defined responsibilities working together to orchestrate scalable and fault-tolerant ML serving and management. These components communicate internally using gRPC and they can be scaled independently. Seldon Core 2 services can be split into two categories:

* **Control Plane** services are responsible for managing the operations and configurations of your ML models and workflows. This includes functionality to instantiate new inference servers, load models, update new versions of models, configure model experiments and pipelines, and expose endpoints that may receive inference requests. The main control plane component is the **Scheduler** that is responsible for managing the loading and unloading of resources (models, pipelines, experiments) onto the respective components.

* **Data Plane** services are responsible for managing the flow of data between components or models. Core 2 supports REST and gRPC payloads that follow the Open Inference Protocol (OIP). The main data plane service is **Envoy**, which acts as a single ingress for all data plane load and routes data to the relevant servers internally (e.g. Seldon MLServer or NVidia Triton pods).

{% hint style="info" %}
**Note**: Because Core 2 architecture separates control plane and data plane responsibilities, when control plane services are down (e.g. the Scheduler), data plane inference can still be served. In this manner the system is more resilient to failures. For example, an outage of control plane services does not impact the ability of the system to respond to end user traffic. Core 2 can be provisioned to be **highly available** on the data plane path.
{% endhint %}


The current set of services used in Seldon Core 2 is shown below. Following the diagram, we will describe each control plane and data plan service.

![architecture](../images/architecture.png)

The core components are:
## Control Plane

* Scheduler : manages the load and unload of models, pipelines, explainers and experiments.
* Pipeline gateway : handles REST/gRPC calls to pipelines.
* Dataflow engine : handles the flow of data between components in a pipeline.
* Model gateway : handles the flow of data from models to inference requests on servers and passes on the responses.
* Agent : manages the loading and unloading of models on a server and access to the server over REST/gRPC.
* Envoy : manages the proxying of requests to the correct servers including load balancing.
### Scheduler
This service manages the loading and unloading of Models, Pipelines and Experiments on the relevant micro services. It is also responsible for matching Models with available Servers in a way that optimises infrastructure use. In the current design we can only have _one_ instance of the Scheduler as its internal state is persisted on disk.

All the above are Kubernetes agnostic and can run locally, e.g. on Docker Compose.
When the Scheduler (re)starts there is a synchronisation flow to coordinate the startup process and to attempt to wait for expected Model Servers to connect before proceeding with control plane operations. This is important so that ongoing data plane operations are not interrupted. This then introduces a delay on any control plane operations until the process has finished (including control plan resources status updates). This synchronisation process has a timeout, which has a default of 10 minutes. It can be changed by setting helm seldon-core-v2-components value `scheduler.schedulerReadyTimeoutSeconds`.

We also provide a Kubernetes Operator to allow Kubernetes usage.
### Agent
This service manages the loading and unloading of models on a server and access to the server over REST/gRPC. It acts as a reverse proxy to connect end users with the actual Model Servers. In this way the system collects stats and metrics about data plane inferences that helps with observability and scaling.

Kafka is used as the backbone for Pipelines allowing a decentralized, synchronous and asynchronous usage.
### Controller
We also provide a Kubernetes Operator to allow Kubernetes usage. This is implemented in the Controller Manager microservice, which manages CRD reconciliation with Scheduler. Currently Core 2 supports _one_ instance of the Controller.

## Kafka
{% hint style="info" %}
**Note**: All services besides the Controller are Kubernetes agnostic and can run locally, e.g. on Docker Compose.
{% endhint %}

Kafka is used as the backbone for allowing Pipelines of Models to be connected together into arbitrary directed acyclic graphs. Models can be reused in different Pipelines. The flow of data between models is handled by the dataflow engine using [KStreams](https://docs.confluent.io/platform/current/streams/concepts.html).
## Data Plane

![kafka](../images/kafka.png)
### Pipeline Gateway
This service handles REST/gRPC calls to Pipelines. It translates between synchronous requests to Kafka operations, producing a message on the relevant input topic for a Pipeline and consuming from the output topic to return inference results back to the users.

## Dataflow Architecture
### Model Gateway
This service handles the flow of data from models to inference requests on servers and passes on the responses via Kafka.

Seldon V2 follows a dataflow design paradigm and it's part of the current movement for data centric machine learning. By taking a decentralized route that focuses on the flow of data users can have more flexibility and insight in building complex applications containing machine learning and traditional components. This contrasts with a more centralized orchestration more traditional in service orientated architectures.
### Dataflow Engine
This service handles the flow of data between components in a pipeline, using Kafka Streams. It enables Core 2 to chain and join Models together to provide complex Pipelines.

### Envoy
This service manages the proxying of requests to the correct servers including load balancing.

## Dataflow Architecture and Pipelines

To support the movement towards data centric machine learning Seldon Core 2 follows a dataflow paradigm. By taking a decentralized route that focuses on the flow of data, users can have more flexibility and insight as they build and manage complex AI applications in production. This contrasts with more centralized orchestration approaches where data is secondary.

![dataflow](../images/dataflow.png)

### Kafka
Kafka is used as the backbone for Pipelines allowing decentralized, synchronous and asynchronous usage. This enables Models to be connected together into arbitrary directed acyclic graphs. Models can be reused in different Pipelines. The flow of data between models is handled by the dataflow engine using [Kafka Streams](https://docs.confluent.io/platform/current/streams/concepts.html).

![kafka](../images/kafka.png)

By focusing on the data we allow users to join various flows together using stream joining concepts as shown below.

![joins](../images/joins.png)
Expand Down

0 comments on commit 294b5f8

Please sign in to comment.