Skip to content

Commit

Permalink
Migrate to new Task and controller architecture
Browse files Browse the repository at this point in the history
Issue: #136

Before the inital release we are consolidting the controller and crd
architectures.

This commit also polishes the CLI.
  • Loading branch information
jmintb authored and Jessie Chatham Spencer committed Aug 2, 2023
1 parent 0b7a4cd commit 5c1ae0c
Show file tree
Hide file tree
Showing 71 changed files with 4,889 additions and 4,407 deletions.
18 changes: 18 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[workspace]
members = ["controller", "service", "cli", "web", "lib"]
members = ["controller", "service", "cli", "web", "lib" ]
resolver = "2"

48 changes: 48 additions & 0 deletions ame/docs/datasets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Data Sets

AME has a builtin notion of data sets in allowing user's to think in terms of data sets and not just raw tasks.

Here is an example of what a simple data set configuration looks like:

```yaml
# ame.yaml
...
dataSets:
- name: mnist
path: ./data # Specifies where the tasks stores data.
# task:
taskRef: fetch_mnist # References a task which produces data.
```
In the simplest form a data set is just a [task](todo) which produces data a long with a storage mechanism. This allows for a number of benefits over just using tasks directly.
Data can be produced once and used many times, for example if a number of tasks are scheduled AME can prepare the dataset once and use it across all of the dependent tasks.
### Configuring a data set
A simple data set cfg is quick to set and can then be progressively enhanced as your needs expand. Here we will walk through the process of first setting up a simple data set
and then go through the more advanced options.
The minimum requirements for a dataset is a `path` pointer to where data should be saved from and a `Task` which will produce data at that path. As shown in the mnist example above.
Lets start with that here:

```yaml
# ame.yaml
...
dataSets:
- name: mnist
path: ./data # Specifies where the tasks stores data.
# task:
taskRef: fetch_mnist # References a task which produces data.
```

So far so good, we have a path `data` and reference a `Task` that produces our data.


### Interacting with data sets

To see the status of live data sets, use the AME's cli. Current it is only possible to see data sets that are in use, meaning referenced by some running task.

```bash
ame ds list
```

17 changes: 17 additions & 0 deletions ame/docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Welcome to MkDocs

For full documentation visit [mkdocs.org](https://www.mkdocs.org).

## Commands

* `mkdocs new [dir-name]` - Create a new project.
* `mkdocs serve` - Start the live-reloading docs server.
* `mkdocs build` - Build the documentation site.
* `mkdocs -h` - Print help message and exit.

## Project layout

mkdocs.yml # The configuration file.
docs/
index.md # The documentation homepage.
... # Other markdown pages, images and other files.
172 changes: 172 additions & 0 deletions ame/docs/model_validation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
# Guides

## From zero to live model

**This guide is focused on using AME** if you are looking for a deployment guide go [here](todo).



This guide will walk through going from zero to having a model served through an the [V2 inference protocol](https://docs.seldon.io/projects/seldon-core/en/latest/reference/apis/v2-protocol.html).
it will be split into multiple sub steps which can be consumed in isolation if you are just looking for a smaller guide on that specific step.

Almost any python project should be usable but if you want to follow along with the exact same project as the guide clone [this]() repo.

### Setup the CLI

Before we can initialise an AME project we need to install the ame [CLI](todo) and connect with your AME instance.

TODO describe installation

### Initialising AME in your project

The first step will be creating an `ame.yaml` file in the project directory.

This is easiet to do with the ame [CLI]() by running `ame project init`. The [CLI]() will ask for a project and then produce a file
that looks like this:

```yaml
projectName: sklearn_logistic_regression
```
### The first training
Not very exciting but it is a start. Next we want to set up our model to be run by AME. The most important thing here is the Task that will train the model so
lets start with that.
Here we need to consider a few things, what command is used to train a model, how are dependencies managed in our project, what python version do we need and
how many resources does our model training require.
If you are using the [repo]() for this guide, you will want a task configured as below.
```yaml

projectid: sklearn_logistic_regression
tasks:
- name: training
!poetry
executor:
pythonVersion: 3.11
command: python train.py
resources:
memory: 10G
cpu: 4
storage: 30G
nvidia.com/gpu: 1
```
## Your first Task
[`Tasks`](TODO) are an important building block for AME. This guide will walk you through the basic of constructing and running [`Task`](todo).

We assume that the AME [CLI](todo) is setup and connected to an AME instance. If not see this [guide](todo).

Before we can run a task we must have a project setup. To init a project follow the commands as shown below, replacing myproject with the
path to your project.

```sh
cd myproject
ame init
```

Now you should have an AME file ame.yaml inside your project:
```yaml
name: myproject
```

Not very exciting yet. Next we want to add a Task to this file so we can run it.
Update your file to match the changes shown below.

```yaml
name: myproject
tasks:
- name: training
!poetry
executor:
pythonVersion: 3.11
command: python train.py
resources:
memory: 2G
cpu: 2
storage: 10G
```

Here we add a list of tasks for our project, containing a single `Task` called training. Lets look at the anatomy of training.

First we set the name `name: training`, pretty standard YAML. Next we set the [executor](todo). This syntax might seem a bit confusing
if you have not used this YAML feature before. `!poetry` adds a tag to the executor indicating the executor type. In this case we are
using the poetry executor. It requires two fields to be set. the Python version and the command to run. This tells AME how to execute the [`Task`](todo).

Finally we set the required resources. 2G ram, 2 cpu threads and 10G of storage.

To run the task we can use the CLI:
```sh
ame task run
```



## Validating models before deployment

To ensure that a new model versions perform well before exposing them AME supports model validation. This is done by providing AME with a `Task` which
will succeed if the model passes validation and fail if not.

Example from [ame-demo](https://github.com/TeaInSpace/ame-demo):

```yaml
projectid: sklearn_logistic_regression
models:
- name: logreg
type: mlflow
validationTask: # the validation task is set here.
taskRef: mlflow_validation
training:
task:
taskRef: training
deployment:
auto_train: true
deploy: true
enable_tls: false
tasks:
- name: training
projectid: sklearn_logistic_regression
templateRef: shared-templates.logistic_reg_template
taskType: Mlflow
- name: mlflow_validation
projectid: sklearn_logistic_regression
runcommand: python validate.py
```

This approach allows for a lot of flexibility of how models are validated, at the cost of writing the validation your self. In the future AME will provide builtin options for common validation configurations as well, see the [roadmap](todo).

### Using MLflow metrics

Here we will walk through how to validate a model based on recorded metrics in MLflow, using the [ame-demo](https://github.com/TeaInSpace/ame-demo) repository as an example. The model is a simple logistic regresser, the training code looks like this:

```python
import numpy as np
from sklearn.linear_model import LogisticRegression
import mlflow
import mlflow.sklearn
import os
if __name__ == "__main__":
X = np.array([-2, -1, 0, 1, 2, 1]).reshape(-1, 1)
y = np.array([0, 0, 1, 1, 1, 0])
lr = LogisticRegression()
lr.fit(X, y)
score = lr.score(X, y)
print("Score: %s" % score)
mlflow.log_metric("score", score)
mlflow.sklearn.log_model(lr, "model", registered_model_name="logreg")
print("Model saved in run %s" % mlflow.active_run().info.run_uuid)
```

Notice how the score is logged as a metric. We can use that in our validation.

AME exposes the necessary environment variables to running tasks so we can access the Mlflow instance during validation just by using the Mlflow library.

```python
TODO
```
30 changes: 30 additions & 0 deletions ame/docs/models.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
<h1>Models</h1>
<p>Models are one of AME's higher level constructs, see what that means <a href="">here</a>. if you are configuring how a model should be trained, deployed, monitored or validted this is the right place.
Models exist in an AME file along side Datasets Tasks and Templates.</p>
<h3>Model training</h3>
<p>Model training is configured described use a <a href="./task.html">Task</a>.</p>
<p>AME can be deployed with a an MLflow instance which will be exposed to the Training Task allowing for simply storage and retrievel of models and metrics.</p>
<pre lang="yaml" style="background-color:#2b303b;"><code><span style="color:#65737e;"># main project ame.yml
</span><span style="color:#bf616a;">project</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">xgboost_project
</span><span style="color:#bf616a;">models</span><span style="color:#c0c5ce;">:
</span><span style="color:#c0c5ce;"> - </span><span style="color:#bf616a;">name</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">product_recommendor
</span><span style="color:#c0c5ce;"> </span><span style="color:#bf616a;">training</span><span style="color:#c0c5ce;">:
</span><span style="color:#c0c5ce;"> </span><span style="color:#bf616a;">task</span><span style="color:#c0c5ce;">:
</span><span style="color:#c0c5ce;"> </span><span style="color:#bf616a;">taskRef</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">train_my_model
</span><span style="color:#bf616a;">tasks</span><span style="color:#c0c5ce;">:
</span><span style="color:#c0c5ce;"> - </span><span style="color:#bf616a;">name</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">train_my_model
</span><span style="color:#c0c5ce;"> </span><span style="color:#bf616a;">fromTemplate</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">shared_templates.xgboost_resources
</span><span style="color:#c0c5ce;"> </span><span style="color:#bf616a;">executor</span><span style="color:#c0c5ce;">:
</span><span style="color:#c0c5ce;"> </span><span style="color:#b48ead;">!poetry
</span><span style="color:#c0c5ce;"> </span><span style="color:#bf616a;">pythonVersion</span><span style="color:#c0c5ce;">: </span><span style="color:#d08770;">3.11
</span><span style="color:#c0c5ce;"> </span><span style="color:#bf616a;">command</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">python train.py
</span><span style="color:#c0c5ce;"> </span><span style="color:#bf616a;">resources</span><span style="color:#c0c5ce;">:
</span><span style="color:#c0c5ce;"> </span><span style="color:#bf616a;">memory</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">10G
</span><span style="color:#c0c5ce;"> </span><span style="color:#bf616a;">cpu</span><span style="color:#c0c5ce;">: </span><span style="color:#d08770;">4
</span><span style="color:#c0c5ce;"> </span><span style="color:#bf616a;">storage</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">30G
</span><span style="color:#c0c5ce;"> </span><span style="color:#bf616a;">nvidia.com/gpu</span><span style="color:#c0c5ce;">: </span><span style="color:#d08770;">1
</span></code></pre>
<h3>Model deployment</h3>
<h4>Model validation</h4>
<h4>Model monitoring</h4>
<h3>Batch inference</h3>
42 changes: 42 additions & 0 deletions ame/docs/models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Models

Models are one of AME's higher level constructs, see what that means [here](). if you are configuring how a model should be trained, deployed, monitored or validted this is the right place.
Models exist in an AME file along side Datasets Tasks and Templates.

### Model training

Model training is configured described use a [Task](tasks.md).

AME can be deployed with a an MLflow instance which will be exposed to the Training Task allowing for simply storage and retrievel of models and metrics.


```yaml
# main project ame.yml
project: xgboost_project
models:
- name: product_recommendor
training:
task:
taskRef: train_my_model
tasks:
- name: train_my_model
fromTemplate: shared_templates.xgboost_resources
executor:
!poetry
pythonVersion: 3.11
command: python train.py
resources:
memory: 10G
cpu: 4
storage: 30G
nvidia.com/gpu: 1
```
### Model deployment
#### Model validation
#### Model monitoring
### Batch inference
Loading

0 comments on commit 5c1ae0c

Please sign in to comment.