Migrate to new Task and controller architecture

Issue: #136 Before the inital release we are consolidting the controller and crd architectures. This commit also polishes the CLI.
TeaInSpace · Aug 2, 2023 · 5c1ae0c · 5c1ae0c
1 parent 0b7a4cd
commit 5c1ae0c
Show file tree

Hide file tree

Showing 71 changed files with 4,889 additions and 4,407 deletions.
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/Cargo.toml b/Cargo.toml
@@ -1,4 +1,4 @@
 [workspace]
-members = ["controller", "service", "cli",  "web",  "lib"]
+members = ["controller", "service", "cli",  "web",  "lib" ]
 resolver = "2"
 
diff --git a/ame/docs/datasets.md b/ame/docs/datasets.md
@@ -0,0 +1,48 @@
+# Data Sets
+
+AME has a builtin notion of data sets in allowing user's to think in terms of data sets and not just raw tasks. 
+
+Here is an example of what a simple data set configuration looks like:
+
+```yaml
+# ame.yaml
+...
+dataSets:
+  - name: mnist
+    path: ./data # Specifies where the tasks stores data.
+#    task:
+      taskRef: fetch_mnist # References a task which produces data.     
+```
+
+In the simplest form a data set is just a [task](todo) which produces data a long with a storage mechanism. This allows for a number of benefits over just using tasks directly.
+Data can be produced once and used many times, for example if a number of tasks are scheduled AME can prepare the dataset once and use it across all of the dependent tasks.
+
+### Configuring a data set
+
+A simple data set cfg is quick to set and can then be progressively enhanced as your needs expand. Here we will walk through the process of first setting up a simple data set
+and then go through the more advanced options.
+
+The minimum requirements for a dataset is a `path` pointer to where data should be saved from and a `Task` which will produce data at that path. As shown in the mnist example above.
+Lets start with that here:
+
+```yaml
+# ame.yaml
+...
+dataSets:
+  - name: mnist
+    path: ./data # Specifies where the tasks stores data.
+#    task:
+      taskRef: fetch_mnist # References a task which produces data.     
+```
+
+So far so good, we have a path `data` and reference a `Task` that produces our data.
+
+
+### Interacting with data sets
+
+To see the status of live data sets, use the AME's cli. Current it is only possible to see data sets that are in use, meaning referenced by some running task.
+
+```bash
+ame ds list
+```  
+
diff --git a/ame/docs/index.md b/ame/docs/index.md
@@ -0,0 +1,17 @@
+# Welcome to MkDocs
+
+For full documentation visit [mkdocs.org](https://www.mkdocs.org).
+
+## Commands
+
+* `mkdocs new [dir-name]` - Create a new project.
+* `mkdocs serve` - Start the live-reloading docs server.
+* `mkdocs build` - Build the documentation site.
+* `mkdocs -h` - Print help message and exit.
+
+## Project layout
+
+    mkdocs.yml    # The configuration file.
+    docs/
+        index.md  # The documentation homepage.
+        ...       # Other markdown pages, images and other files.
diff --git a/ame/docs/model_validation.md b/ame/docs/model_validation.md
@@ -0,0 +1,172 @@
+# Guides
+
+## From zero to live model
+
+**This guide is focused on using AME** if you are looking for a deployment guide go [here](todo).
+
+
+
+This guide will walk through going from zero to having a model served through an the [V2 inference protocol](https://docs.seldon.io/projects/seldon-core/en/latest/reference/apis/v2-protocol.html).
+it will be split into multiple sub steps which can be consumed in isolation if you are just looking for a smaller guide on that specific step.
+
+Almost any python project should be usable but if you want to follow along with the exact same project as the guide clone [this]() repo.
+
+### Setup the CLI
+
+Before we can initialise an AME project we need to install the ame [CLI](todo) and connect with your AME instance.
+
+TODO describe installation
+
+### Initialising AME in your project
+
+The first step will be creating an `ame.yaml` file in the project directory.
+
+This is easiet to do with the ame [CLI]() by running `ame project init`. The [CLI]() will ask for a project and then produce a file
+that looks like this:
+
+```yaml
+projectName: sklearn_logistic_regression
+```
+
+### The first training
+
+Not very exciting but it is a start. Next we want to set up our model to be run by AME. The most important thing here is the Task that will train the model so
+lets start with that.
+
+Here we need to consider a few things, what command is used to train a model, how are dependencies managed in our project, what python version do we need and
+how many resources does our model training require.
+
+If you are using the [repo]() for this guide, you will want a task configured as below. 
+
+```yaml
+
+projectid: sklearn_logistic_regression
+tasks:
+  - name: training
+    !poetry
+    executor:
+      pythonVersion: 3.11
+      command: python train.py
+    resources:
+      memory: 10G 
+      cpu: 4 
+      storage: 30G 
+      nvidia.com/gpu: 1 
+```
+
+## Your first Task
+
+[`Tasks`](TODO) are an important building block for AME. This guide will walk you through the basic of constructing and running [`Task`](todo). 
+
+We assume that the AME [CLI](todo) is setup and connected to an AME instance. If not see this [guide](todo). 
+
+Before we can run a task we must have a project setup. To init a project follow the commands as shown below, replacing myproject with the 
+path to your project.
+
+```sh
+cd myproject
+ame init
+```
+
+Now you should have an AME file ame.yaml inside your project:
+```yaml
+name: myproject
+```
+
+Not very exciting yet. Next we want to add a Task to this file so we can run it.
+Update your file to match the changes shown below.
+
+```yaml
+name: myproject
+tasks:
+  - name: training
+    !poetry
+    executor:
+      pythonVersion: 3.11
+      command: python train.py
+    resources:
+      memory: 2G 
+      cpu: 2 
+      storage: 10G 
+```
+
+Here we add a list of tasks for our project, containing a single `Task` called training. Lets look at the anatomy of training.
+
+First we set the name `name: training`, pretty standard YAML. Next we set the [executor](todo). This syntax might seem a bit confusing
+if you have not used this YAML feature before. `!poetry` adds a tag to the executor indicating the executor type. In this case we are 
+using the poetry executor. It requires two fields to be set. the Python version and the command to run. This tells AME how to execute the [`Task`](todo).
+
+Finally we set the required resources. 2G ram, 2 cpu threads and 10G of storage.
+
+To run the task we can use the CLI:
+```sh
+ame task run
+```
+
+
+
+## Validating models before deployment
+
+To ensure that a new model versions perform well before exposing them AME supports model validation. This is done by providing AME with a `Task` which 
+will succeed if the model passes validation and fail if not.
+
+Example from [ame-demo](https://github.com/TeaInSpace/ame-demo):
+
+```yaml
+
+projectid: sklearn_logistic_regression
+models:
+  - name: logreg
+    type: mlflow
+    validationTask: # the validation task is set here.
+      taskRef: mlflow_validation 
+    training: 
+      task:
+        taskRef: training
+    deployment:
+      auto_train: true
+      deploy: true
+      enable_tls: false
+tasks:
+  - name: training
+    projectid: sklearn_logistic_regression
+    templateRef: shared-templates.logistic_reg_template
+    taskType: Mlflow
+  - name: mlflow_validation
+    projectid: sklearn_logistic_regression
+    runcommand: python validate.py
+```
+
+This approach allows for a lot of flexibility of how models are validated, at the cost of writing the validation your self. In the future AME will provide builtin options for common validation configurations as well, see the [roadmap](todo).
+
+### Using MLflow metrics
+
+Here we will walk through how to validate a model based on recorded metrics in MLflow, using the [ame-demo](https://github.com/TeaInSpace/ame-demo) repository as an example. The model is a simple logistic regresser, the training code looks like this:
+
+```python
+import numpy as np
+from sklearn.linear_model import LogisticRegression
+import mlflow
+import mlflow.sklearn
+import os
+
+if __name__ == "__main__":
+    X = np.array([-2, -1, 0, 1, 2, 1]).reshape(-1, 1)
+    y = np.array([0, 0, 1, 1, 1, 0])
+    lr = LogisticRegression()
+    lr.fit(X, y)
+    score = lr.score(X, y)
+    print("Score: %s" % score)
+    mlflow.log_metric("score", score)
+    mlflow.sklearn.log_model(lr, "model", registered_model_name="logreg")
+    print("Model saved in run %s" % mlflow.active_run().info.run_uuid)
+```
+
+Notice how the score is logged as a metric. We can use that in our validation.
+
+AME exposes the necessary environment variables to running tasks so we can access the Mlflow instance during validation just by using the Mlflow library.
+
+```python
+TODO
+
+```
diff --git a/ame/docs/models.html b/ame/docs/models.html
@@ -0,0 +1,30 @@
+<h1>Models</h1>
+<p>Models are one of AME's higher level constructs, see what that means <a href="">here</a>. if you are configuring how a model should be trained, deployed, monitored or validted this is the right place.
+Models exist in an AME file along side Datasets Tasks and Templates.</p>
+<h3>Model training</h3>
+<p>Model training is configured described use a <a href="./task.html">Task</a>.</p>
+<p>AME can be deployed with a an MLflow instance which will be exposed to the Training Task allowing for simply storage and retrievel of models and metrics.</p>
+<pre lang="yaml" style="background-color:#2b303b;"><code><span style="color:#65737e;"># main project ame.yml
+</span><span style="color:#bf616a;">project</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">xgboost_project
+</span><span style="color:#bf616a;">models</span><span style="color:#c0c5ce;">:
+</span><span style="color:#c0c5ce;">  - </span><span style="color:#bf616a;">name</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">product_recommendor
+</span><span style="color:#c0c5ce;">    </span><span style="color:#bf616a;">training</span><span style="color:#c0c5ce;">:
+</span><span style="color:#c0c5ce;">      </span><span style="color:#bf616a;">task</span><span style="color:#c0c5ce;">: 
+</span><span style="color:#c0c5ce;">        </span><span style="color:#bf616a;">taskRef</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">train_my_model 
+</span><span style="color:#bf616a;">tasks</span><span style="color:#c0c5ce;">:
+</span><span style="color:#c0c5ce;">  - </span><span style="color:#bf616a;">name</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">train_my_model
+</span><span style="color:#c0c5ce;">    </span><span style="color:#bf616a;">fromTemplate</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">shared_templates.xgboost_resources
+</span><span style="color:#c0c5ce;">    </span><span style="color:#bf616a;">executor</span><span style="color:#c0c5ce;">:
+</span><span style="color:#c0c5ce;">      </span><span style="color:#b48ead;">!poetry
+</span><span style="color:#c0c5ce;">      </span><span style="color:#bf616a;">pythonVersion</span><span style="color:#c0c5ce;">: </span><span style="color:#d08770;">3.11
+</span><span style="color:#c0c5ce;">      </span><span style="color:#bf616a;">command</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">python train.py
+</span><span style="color:#c0c5ce;">    </span><span style="color:#bf616a;">resources</span><span style="color:#c0c5ce;">:
+</span><span style="color:#c0c5ce;">      </span><span style="color:#bf616a;">memory</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">10G 
+</span><span style="color:#c0c5ce;">      </span><span style="color:#bf616a;">cpu</span><span style="color:#c0c5ce;">: </span><span style="color:#d08770;">4 
+</span><span style="color:#c0c5ce;">      </span><span style="color:#bf616a;">storage</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">30G 
+</span><span style="color:#c0c5ce;">      </span><span style="color:#bf616a;">nvidia.com/gpu</span><span style="color:#c0c5ce;">: </span><span style="color:#d08770;">1 
+</span></code></pre>
+<h3>Model deployment</h3>
+<h4>Model validation</h4>
+<h4>Model monitoring</h4>
+<h3>Batch inference</h3>
diff --git a/ame/docs/models.md b/ame/docs/models.md
@@ -0,0 +1,42 @@
+# Models
+
+Models are one of AME's higher level constructs, see what that means [here](). if you are configuring how a model should be trained, deployed, monitored or validted this is the right place.
+Models exist in an AME file along side Datasets Tasks and Templates.
+
+### Model training
+
+Model training is configured described use a [Task](tasks.md).
+
+AME can be deployed with a an MLflow instance which will be exposed to the Training Task allowing for simply storage and retrievel of models and metrics.
+
+
+```yaml
+# main project ame.yml
+project: xgboost_project
+models:
+  - name: product_recommendor
+    training:
+      task: 
+        taskRef: train_my_model 
+tasks:
+  - name: train_my_model
+    fromTemplate: shared_templates.xgboost_resources
+    executor:
+      !poetry
+      pythonVersion: 3.11
+      command: python train.py
+    resources:
+      memory: 10G 
+      cpu: 4 
+      storage: 30G 
+      nvidia.com/gpu: 1 
+```
+
+
+### Model deployment 
+
+#### Model validation
+
+#### Model monitoring
+
+### Batch inference