Merge pull request #475 from threefoldtech/development_ai_ml

added ai-ml guide
threefoldtech · Apr 15, 2024 · bdc95fa · bdc95fa
2 parents b9a97b0 + fcc751f
commit bdc95fa
Show file tree

Hide file tree

Showing 4 changed files with 130 additions and 2 deletions.
diff --git a/src/SUMMARY.md b/src/SUMMARY.md
@@ -253,7 +253,8 @@
       - [Redis](documentation/system_administrators/advanced/grid3_redis.md)
       - [IPFS](documentation/system_administrators/advanced/ipfs/ipfs_toc.md)
         - [IPFS on a Full VM](documentation/system_administrators/advanced/ipfs/ipfs_fullvm.md)
-        - [IPFS on a Micro VM](documentation/system_administrators/advanced/ipfs/ipfs_microvm.md) 
+        - [IPFS on a Micro VM](documentation/system_administrators/advanced/ipfs/ipfs_microvm.md)
+      - [AI & ML Workloads](documentation/system_administrators/advanced/ai_ml_workloads.md)
   - [ThreeFold Token](documentation/threefold_token/threefold_token.md)
     - [TFT Bridges](documentation/threefold_token/tft_bridges/tft_bridges.md)
       - [TFChain-Stellar Bridge](documentation/threefold_token/tft_bridges/tfchain_stellar_bridge.md)

diff --git a/src/documentation/system_administrators/advanced/advanced.md b/src/documentation/system_administrators/advanced/advanced.md
@@ -12,3 +12,4 @@ In this section, we delve into sophisticated topics and powerful functionalities
 - [IPFS](./ipfs/ipfs_toc.md)
   - [IPFS on a Full VM](./ipfs/ipfs_fullvm.md)
   - [IPFS on a Micro VM](./ipfs/ipfs_microvm.md)
+- [AI & ML Workloads](./ai_ml_workloads.md)
diff --git a/src/documentation/system_administrators/advanced/ai_ml_workloads.md b/src/documentation/system_administrators/advanced/ai_ml_workloads.md
@@ -0,0 +1,125 @@
+<h1> AI & ML Workloads </h1>
+
+<h2> Table of Contents </h2>
+
+- [Introduction](#introduction)
+- [Prerequisites](#prerequisites)
+- [Prepare the System](#prepare-the-system)
+- [Install the GPU Driver](#install-the-gpu-driver)
+- [Set a Python Virtual Environment](#set-a-python-virtual-environment)
+- [Install PyTorch and Test Cuda](#install-pytorch-and-test-cuda)
+- [Set and Access Jupyter Notebook](#set-and-access-jupyter-notebook)
+- [Run AI/ML Workloads](#run-aiml-workloads)
+
+***
+
+## Introduction
+
+We present a basic method to deploy artificial intelligence (AI) and machine learning (ML) on the TFGrid. For this, we make use of dedicated nodes and GPU support.
+
+In the first part, we show the steps to install the Nvidia driver of a GPU card on a full VM Ubuntu 22.04 running on the TFGrid.
+
+In the second part, we show how to use PyTorch to run AI/ML tasks.
+
+## Prerequisites
+
+You need to reserve a [dedicated GPU node](../../dashboard/deploy/dedicated_machines.md) on the ThreeFold Grid.
+
+## Prepare the System
+
+- Update the system
+    ```
+    dpkg --add-architecture i386
+    apt-get update
+    apt-get dist-upgrade
+    reboot
+    ```
+- Check the GPU info
+    ```
+    lspci | grep VGA
+    lshw -c video
+    ```
+
+## Install the GPU Driver
+
+- Download the latest Nvidia driver
+  - Check which driver is recommended
+      ```
+      apt install ubuntu-drivers-common
+      ubuntu-drivers devices
+      ```
+  - Install the recommended driver (e.g. with 535)
+      ```
+      apt install nvidia-driver-535
+      ```
+  - Reboot and reconnect to the VM
+- Check the GPU status
+    ```
+    nvidia-smi
+    ```
+
+Now that the GPU node is set, let's work on setting PyTorch to run AI/ML workloads.
+
+## Set a Python Virtual Environment
+
+Before installing Python package with pip, you should create a virtual environment.
+
+- Install the prerequisites
+  ```
+  apt update
+  apt install python3-pip python3-dev
+  pip3 install --upgrade pip
+  pip3 install virtualenv
+  ```
+- Create a virtual environment
+  ```
+  mkdir ~/python_project
+  cd ~/python_project
+  virtualenv python_project_env
+  source python_project_env/bin/activate
+  ```
+
+## Install PyTorch and Test Cuda
+
+Once you've created and activated a virtual environment for Pyhton, you can install different Python packages.
+
+- Install PyTorch and upgrade Numpy
+    ```
+    pip3 install torch
+    pip3 install numpy --upgrade
+    ```
+
+Before going further, you can check if Cuda is properly installed on your machine.
+
+- Check that Cuda is available on Python with PyTorch by using the following lines:
+    ```
+    import torch
+    torch.cuda.is_available()
+    torch.cuda.device_count() # the output should be 1
+    torch.cuda.current_device() # the output should be 0
+    torch.cuda.device(0)
+    torch.cuda.get_device_name(0)
+    ```
+
+## Set and Access Jupyter Notebook
+
+You can run Jupyter Notebook on the remote VM and access it on your local browser.
+
+- Install Jupyter Notebook 
+    ```
+    pip3 install notebook
+    ```
+- Run Jupyter Notebook in no-browser mode and take note of the URL and the token
+  ```
+  jupyter notebook --no-browser --port=8080 --ip=0.0.0.0
+  ```
+- On your local machine, copy and paste on a browser the given URL but make sure to change `127.0.0.1` with the WireGuard IP (here it is `10.20.4.2`) and to set the correct token.
+  ```
+  http://10.20.4.2:8080/tree?token=<insert_token>
+  ```
+
+## Run AI/ML Workloads
+
+After following the steps above, you should now be able to run Python codes that will make use of your GPU node to compute AI and ML workloads.
+
+Feel free to explore different ways to use this feature. For example, the [HuggingFace course](https://huggingface.co/learn/nlp-course/chapter1/1) on natural language processing is a good introduction to machine learning.
diff --git a/src/documentation/system_administrators/system_administrators.md b/src/documentation/system_administrators/system_administrators.md
@@ -81,4 +81,5 @@ For complementary information on ThreeFold grid and its cloud component, refer t
   - [Redis](./advanced/grid3_redis.md)
   - [IPFS](./advanced/ipfs/ipfs_toc.md)
     - [IPFS on a Full VM](./advanced/ipfs/ipfs_fullvm.md)
-    - [IPFS on a Micro VM](./advanced/ipfs/ipfs_microvm.md) 
+    - [IPFS on a Micro VM](./advanced/ipfs/ipfs_microvm.md) 
+  - [AI & ML Workloads](./advanced/ai_ml_workloads.md)