Merge branch 'main' into 155-use-template-engine-to-render-json-respo…

…nse-to-reports-in-markdownquarto-markdown-format
UBC-MDS · Jun 20, 2024 · 5fff2e9 · 5fff2e9
2 parents 108eb17 + bfec0a7
commit 5fff2e9
Show file tree

Hide file tree

Showing 272 changed files with 26,101 additions and 30,475 deletions.
diff --git a/README.md b/README.md
@@ -1,68 +1,208 @@
-# fixml
-
-To be filled
+# FixML
+![CI status check](https://github.com/UBC-MDS/fixml/actions/workflows/ci.yml/badge.svg)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![License: CC BY 4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
+
+<p align="center">
+    <img src="./img/logo.png?raw=true" width="175" height="175">
+</p>
+
+A tool for providing context-aware evaluations using a checklist-based approach
+on the Machine Learning project code bases.
+
+## Motivation
+
+Testing codes in Machine Learning project mostly revolves around ensuring the
+findings are reproducible. To achieve this, currently it requires a lot of
+manual efforts. It is because such projects usually have assumptions that are
+hard to quantify in traditional software engineering approach i.e. code
+coverage. One such example would be testing the model's performance, which will
+not result in any errors, but we do expect this result to be reproducible by
+others. Testing such codes, therefore, require us to not only quantitatively,
+but also to qualitatively gauge how effective the tests are.
+
+A common way to handle this currently is to utilize expertise from domain
+experts in this area. Researches and guidelines have been done on how to
+incorporate such knowledge through the use of checklists. However, this requires
+manually validating the checklist items which usually results in poor
+scalability and slow feedback loop for developers, which are incompatible with
+today's fast-paced, competitive landscape in ML developments.
+
+This tool aims to bridge the gap between these two different approaches, by
+adding Large Language Models (LLMs) into the loop, given LLMs' recent
+advancement in multiple areas including NLU tasks and code-related tasks. They
+have been shown to some degrees the ability to analyze codes and to produce
+context-aware suggestions. This tool simplifies such workflow by providing a
+command line tool as well as a high-level API for developers and researchers
+alike to quickly validate if their tests satisfy common areas that are required 
+for reproducibility purposes.
+
+Given LLMs' tendency to provide plausible but factually incorrect information,
+extensive analyses have been done on ensuring the responses are aligned with
+ground truths and human expectations both accurately and consistently. Based on
+these analyses, we are also able to continuously refine our prompts and
+workflows.
 
 ## Installation
 
-1. Create a conda environment using `environment.yml` in the repo:
+This tool is on PyPI. To install, please run:
 
 ```bash
-conda env create -f environment.yml
+$ pip install fixml
 ```
 
-2. Activate the newly created conda environment (default name `fixml`):
+## Usage
+
+### CLI tool
+
+Once installed, the tool offers a Command Line Interface (CLI) command `fixml`.
+By using this command you will be able to evaluate your project code bases,
+generate test function specifications, and perform various relevant tasks.
+
+Run `fixml --help` for more details.
+
+> [!IMPORTANT]
+> By default, this tool uses OpenAI's `gpt3.5-turbo` for evaluation. To run any
+command that requires calls to LLM (i.e. `fixml evaluate`, `fixml generate`),
+an environment variable `OPENAI_API_KEY` needs to be set. To do so, either use
+`export` to set the variable in your current session, or create a `.env` file
+with a line `OPENAI_API_KEY={your-api-key}` saved in your working directory.
+
+> [!TIP]
+> Currently, only calls to OpenAI endpoints are supported. This tool is still in
+ongoing development and integrations with other service providers and locally
+hosted LLMs are planned.
+
+#### Test Evaluator
+
+The test evaluator command is used to evaluate the tests of your repository. It
+generates an evaluation report and provides various options for customization,
+such as specifying a checklist file, output format, and verbosity.
 
+Example calls:
 ```bash
-conda activate fixml
+# Evaluate repo, and output the evalutions as a JSON file in working directory
+$ fixml evaluate /path/to/your/repo
+
+# Perform the above verbosely, and use the JSON file to export a HTML report
+$ fixml evaluate /path/to/your/repo -e ./eval_report.html -v
+
+# Perform the above, but use a custom checklist, and to overwrite existing report
+$ fixml evaluate /path/to/your/repo -e ./eval_report.html -v -o -c checklist/checklist.csv
+
+# Perform the above, and to use gpt-4o as the evaluation model
+$ fixml evaluate /path/to/your/repo -e ./eval_report.html -v -o -c checklist/checklist.csv -m gpt-4o
 ```
 
-3. In the conda environment, `poetry` should be installed. Use Poetry to install the package:
+#### Test Spec Generator
 
+The test spec generator command is used to generate a test specification from a
+checklist. It allows for the inclusion of an optional checklist file to guide
+the test specification generation process.
+
+Example calls:
 ```bash
-poetry install
+# Generate test function specifications and to write them into a .py file
+$ fixml generate test.py
+
+# Perform the above, but to use a custom checklist
+$ fixml generate test.py -c checklist/checklist.csv
 ```
 
-4. add `.env` with API key attached:
+### Package
+
+Alternatively, you can use the package to import all components necessary for running the evaluation/generation workflows listed above.
+
+The workflows used in the package have been designed to be fully modular. You
+can easily switch between different prompts, models and checklists to use. You
+can also write your own custom classes to extend the capability of this library.
+
+Consult the API documentation on Readthedocs for more information and example calls.
+
+## Development Build
 
+If you are interested in helping the development of this tool, or you would like
+to get the cutting-edge version of this tool, you can install this tool via
+conda.
+
+To do this, ensure you have Miniconda/Anaconda installed on your system. You can
+download miniconda
+on [their official website](https://docs.anaconda.com/miniconda/).
+
+
+1. Clone this repository from GitHub:
 ```bash
-echo "OPENAI_API_KEY=..." > .env
+$ git clone [email protected]:UBC-MDS/fixml.git
 ```
 
-5. Enjoy! This package comes will an executable `fixml` and a bunch of scripts. Here are some examples:
+2. Create a conda environment:
+
 ```bash
+$ conda env create -f environment.yaml
+```
 
-# evaluate a repository and write a HTML report, display verbose messages
-fixml evaluate $REPO_PATH ./report.html --verbose
+3. Activate the newly created conda environment (default name `fixml`):
 
-# optional arguments to modify the default behaviour
-# see `fixml evaluate --help`
-fixml evaluate $REPO_PATH --test_dirs=./tests,./src/tests --model=gpt-4o
+```bash
+$ conda activate fixml
+```
 
-# export checklist items into a PDF, overwrite file if exists in the specified path
-fixml checklist export ./checklist/checklist.csv/ checklist.pdf --overwrite
+4. Use `poetry` which is preinstalled in the conda environment to create a local package install:
+
+```bash
+$ poetry install
 ```
 
-## Usage
+5. You now should be able to run `fixml`, try:
+```bash
+fixml --help
+```
 
-### code_analyzer
+## Running the Tests
 
-This is used to analyze and extract information from a downloaded repo.
+Navigate to the project root directory and use the following command in terminal
+to run the test suite:
 
-There is a file containing the example calls. Run:
+```bash
+# skip integration tests
+$ pytest -m "not integeration"
 
-```console
-$ python ./src/code_analyzer/example.py <path-to-your-repo>
+# run ALL tests, which requires OPENAI_API_KEY to be set
+$ pytest
 ```
 
-
 ## Contributing
 
-Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
+Interested in contributing? Check out the contributing guidelines. Please note
+that this project is released with a Code of Conduct. By contributing to this
+project, you agree to abide by its terms.
 
 ## License
 
-`fixml` was created by John Shiu, Orix Au Yeung, Tony Shum, Yingzi Jin. It is licensed under the terms of the MIT license.
+`fixml` was created by John Shiu, Orix Au Yeung, Tony Shum, and Yingzi Jin as a
+deliverable product during our capstone project of the UBC-MDS program in
+collaboration with Dr. Tiffany Timbers and Dr. Simon Goring. It is licensed
+under the terms of the MIT license for software code. Reports and instructional
+materials are licensed under the terms of the CC-BY 4.0 license.
+## Citation
+
+If you use fixml in your work, please cite:
 
-## Credits
+```
+@misc{mds2024fixml,
+  author =       {John Shiu, Orix Au Yeung, Tony Shum, and Yingzi Jin},
+  title =        {fixml: A Comprehensive Tool for Test Evaluation and Specification Generation},
+  howpublished = {\url{https://https://github.com/UBC-MDS/fixml}},
+  year =         {2024}
+}
+```
 
-`fixml` was created with [`cookiecutter`](https://cookiecutter.readthedocs.io/en/latest/) and the `py-pkgs-cookiecutter` [template](https://github.com/py-pkgs/py-pkgs-cookiecutter).
+## Acknowledgements
+We'd like to thank everyone who has contributed to the development of
+the `fixml` package. This is a new project aimed at enhancing the robustness and
+reproducibility of applied machine learning software. It is meant to be a
+research tool and is currently hosting on GitHub as an open source project. We
+welcome it to be read, revised, and supported by data scientists, machine
+learning engineers, educators, practitioners, and hobbyists alike. Your
+contributions and feedback are invaluable in making this package a reliable
+resource for the community. 
diff --git a/report/batch_run.py → analysis/batch_run.py b/report/batch_run.py → analysis/batch_run.py
diff --git a/report/batch_run.yml → analysis/batch_run.yml b/report/batch_run.yml → analysis/batch_run.yml
diff --git a/...ixml/archive/modules/workflow/__init__.py → analysis/results/figures/.gitkeep b/...ixml/archive/modules/workflow/__init__.py → analysis/results/figures/.gitkeep
diff --git a/analysis/results/output/.gitkeep b/analysis/results/output/.gitkeep
diff --git a/analysis/results/tables/.gitkeep b/analysis/results/tables/.gitkeep