A comprehensive cookiecutter template for machine learning projects with DVC (Data Version Control) integration.
This template provides a structured foundation for machine learning projects with integrated data version control, reproducible pipelines, and development best practices. It's designed to help you start new ML projects quickly with all the necessary tooling in place.
- Organized project structure following ML best practices
- DVC integration for data version control and pipeline orchestration
- Reproducible workflows with parameterized ML pipelines
- Development tools including:
- Pre-commit hooks (Black, isort, flake8)
- Testing framework
- GitHub Actions workflow for ML reports
- Customizable to suit your specific needs
- Python. For setting up the environment and Python dependencies (version 3.10 or higher).
- Cookiecutter. For setting up the project structure.
- Git. For versioning your code.
To create a new project, run the following commands:
# install the cookiecutter package
pip install cookiecutter-data-science
# create a new project using the template
ccds gh:eriknovak/cookiecutter-ml-dvc
You'll be prompted for inputs to customize your project:
project_name
: Name of your projectproject_description
: Brief description of your projectversion
: Initial version (default: 0.1.0)python_version
: Python version (default: 3.10)author_name
: Your nameauthor_email
: Your email- ... and more configurable options
Afterwards, follow the README within the created project for further instructions.
After creating the project, initialize a new Git repository and commit the initial project structure:
cd <project_name>
git init
git add .
git commit -m "Initial commit"
You can then push the repository to your remote Git server. After that, you can start developing your project.
Inspired by the cookiecutter data science project structure.