This repository contains the code and documentation for developing and deploying machine learning models while adhering to engineering best practices.
- Navigate to the project directory:
cd <base>/ml-engineering
- Create and activate the conda environment:
conda env create --file deploy/conda/linux_py312.yml
conda activate mle
- Manage dependencies:
- Install additional dependencies using conda or pip as needed.
- Update environment file:
conda env export --name mle > deploy/conda/linux_py312.yml
- Deactivate environment:
conda deactivate
- Remove environment (if necessary):
conda remove --name mle --all
- Reference code:
<base>/ml-engineering/reference/nonstandardcode
- Working notebooks:
<base>/ml-engineering/notebooks/working
Scripts are derived from working notebooks in <base>/ml-engineering/notebooks/working
.
Ensure the directory containing housing_value
is in PYTHONPATH:
conda env config vars set PYTHONPATH=$(pwd)/src
conda deactivate
conda activate mle
echo $PYTHONPATH
- Argument Parsing: Uses
argparse
for command-line arguments. - Configuration Management: Implements
configparser
withsetup.cfg
. - Logging: Incorporates
logging
for execution tracking and debugging.
Install required tools:
sudo apt install black isort flake8
Tool | Description | Usage |
---|---|---|
Black | Code formatter | black <script.py> |
isort | Import sorter | isort <script.py> |
flake8 | Linter | flake8 <script.py> |
Note: Configurations are specified in setup.cfg
and .vscode/settings.json
(for VS Code users).
chmod +x shell/src_quality.sh
./shell/src_quality.sh
View available options for each script using the --help
flag:
python src/housing_value/ingest_data.py --help
python src/housing_value/train.py --help
python src/housing_value/score.py --help
Install pytest:
sudo apt install python3-pytest
Note: Configurations are specified in setup.cfg
.
Maintain test code quality:
chmod +x shell/tests_quality.sh
./shell/tests_quality.sh
Run tests:
pytest
pytest <test_directory>/<test.py>
Using Sphinx for documentation generation.
- Install the package:
- Option 1: Editable mode (dependent on pyproject.toml): produces egg-info folder.
pip install -e .
- Option 2: Build and install: produces egg-info folder as well as dist folder containing tar.gz and whl file.
python3 -m pip install --upgrade build
python3 -m build
pip install dist/housing_value-0.0.0-py3-none-any.whl
- Install Sphinx & Packages for building documentation:
sudo apt install python3-sphinx
pip install sphinx sphinx-rtd-theme matplotlib
pip install sphinxcontrib-napoleon
- Navigate to the docs directory:
cd docs
-
Check configuration files:
- Make sure to create Makefile.
-
Generate Sphinx project:
sphinx-quickstart
-
Update configuration files:
- Modify
source/conf.py
andsource/index.rst
as needed. - Reference files are available in the
reference
directory.
- Modify
-
Generate API documentation:
sphinx-apidoc -o ./source ../src/housing_value
-
Update configuration files:
- Modify
source/housing_value.rst
andsource/index.rst
as needed. - Reference files are available in the
reference
directory.
- Modify
-
Build HTML documentation:
make clean
make html
- Return to the project root:
cd ..
Note: The documentation file hierarchy in the source
directory is: index.rst > modules.rst > housing_value.rst
.
Note: The file hierarchy for MLflow is structured as follows: MLproject > app.py
.
- Maintaining Code Quality
chmod +x shell/app_quality.sh
./shell/app_quality.sh
- Tracking UI: Launch the MLflow tracking server using the command.
mlflow server --backend-store-uri mlruns/ --default-artifact-root mlruns/ --host 127.0.0.1 --port 5000
- Run Experiment: Execute an experiment to generate a model artifact with the following command.
mlflow run . -P <parameters>
The optional parameter split_size
defaults to 0.2
.
- Python Version Management: Install
pyenv
for managing Python versions and ensuring reproducibility, which facilitates selecting a specific Python version for the project.
chmod +x shell/pyenv.sh
./shell/pyenv.sh
-
Activate Conda Environment: Activate the conda environment created during the experiment execution.
-
Dependency Installation: Install the required dependency in activated environment.
pip install virtualenv
- API Endpoint Generation: Create an API endpoint to serve the model using -
mlflow models serve -m mlruns/<experiment_id>/<run_id>/artifacts/model/ -h 127.0.0.1 -p 1234
- Testing API Endpoint: Test the API endpoint from another terminal with the following formats.
- Datasplit Format:
curl -X POST -H "Content-Type: application/json" --data '{"dataframe_split": {"columns": ["longitude", "latitude", "housing_median_age", "total_rooms", "total_bedrooms", "population", "households", "median_income", "ocean_proximity"], "data": [[-118.39, 34.12, 29.0, 6447.0, 1012.0, 2184.0, 960.0, 8.2816, "<1H OCEAN"]]}}' http://127.0.0.1:1234/invocations
- Inputs/Instances Format:
curl -X POST -H "Content-Type: application/json" --data '{"inputs": [{"longitude": -118.39, "latitude": 34.12, "housing_median_age": 29.0, "total_rooms": 6447.0, "total_bedrooms": 1012.0, "population": 2184.0, "households": 960.0, "median_income": 8.2816, "ocean_proximity": "<1H OCEAN"}]}' http://127.0.0.1:1234/invocations
To facilitate deployment, Docker images are created by aggregating necessary artifacts and configurations.
- Artifact Aggregation:
-
Copy model artifacts (
MLmodel
andmodel.pkl
) frommlruns/<experiment_id>/<run_id>/artifacts/model
to<base>/ml-engineering/deploy/docker/mlruns
. Ensure unnecessary metadata is cleaned from theMLmodel
. -
Transfer the
requirements.txt
file frommlruns/<experiment_id>/<run_id>/artifacts/model
to<base>/ml-engineering/deploy/docker
. -
Move the wheel file (
housing_value-0.0.0-py3-none-any.whl
) from the dist directory to<base>/ml-engineering/deploy/docker
. -
Copy the
setup.cfg
from the project root to<base>/ml-engineering/deploy/docker
, ensuring it contains only data required for inference.
- Script and Configuration Creation:
-
Develop script
run.sh
to execute MLflow models serve command. -
Create
.dockerignore
file to ignore copying files in WORKDIR of image/container. -
Construct Dockerfile to package all components into a Docker image, ensuring efficient deployment and scalability.
- Image Development:
cd deploy/docker
- Build With Root User:
docker build . -t <dockerhub_username>/mle:rootuser -f Dockerfile.rootuser
- Build Without Root User for Security: Enhance security by building an image that does not use the root user.
docker build . -t <dockerhub_username>/mle:nonrootuser -f Dockerfile.nonrootuser
- Use Buildkit for Multistage Builds: Optimize your image size and build time using Docker Buildkit for multistage builds.
DOCKER_BUILDKIT=1 docker build . -t <dockerhub_username>/mle:multistage -f Dockerfile.multistage
This section provides detailed instructions for containerizing your application using Docker and testing endpoints.
- Start the Container: Use the following command to start a Docker container named
rootuser
and map port 8080 on your host to port 5000 in the container.
docker run -dit -p 8080:5000 --name rootuser <dockerhub_username>/mle:rootuser
- Test the Endpoint: Verify that your application is running correctly by sending a POST request to the endpoint using curl.
curl -X POST -H "Content-Type: application/json" --data '{"dataframe_split": {"columns": ["longitude", "latitude", "housing_median_age", "total_rooms", "total_bedrooms", "population", "households", "median_income", "ocean_proximity"], "data": [[-118.39, 34.12, 29.0, 6447.0, 1012.0, 2184.0, 960.0, 8.2816, "<1H OCEAN"]]}}' http://127.0.0.1:8080/invocations
- Push Image to Docker Hub: First, log in to Docker Hub and then push images.
docker login -u <dockerhub_username>
docker push <dockerhub_username>/mle:rootuser
docker push <dockerhub_username>/mle:nonrootuser
docker push <dockerhub_username>/mle:multistage
- List Images and Containers: To view all Docker images and containers on system.
- Images:
docker image ls
- Containers:
docker ps --all
- View Logs: Access the logs of a running container.
docker logs <container_name>
- Delete Containers and Images: Remove a specific container or image using these commands:
- Containers:
docker rm -f <container_name>
- Images:
docker rmi <image_name>
To test your application in a new environment:
- Pull Image from Docker Hub:
docker pull <dockerhub_username>/mle:rootuser
- Start the Container Again:
docker run -dit -p 8080:5000 --name rootuser <dockerhub_username>/mle:rootuser
- Re-test the Endpoint: Use the same curl command as before to verify functionality.