Skip to content

Commit

Permalink
move detailed documentations to RTD; simplify README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
SoloSynth1 committed Jun 26, 2024
1 parent 3ce4add commit e73926d
Show file tree
Hide file tree
Showing 7 changed files with 218 additions and 173 deletions.
39 changes: 5 additions & 34 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,47 +40,18 @@ If you are proposing a feature:

## Get Started!

If you are interested in helping the development of this tool, or you would like
to get the cutting-edge version of this tool, you can install this tool via
conda.
1. Follow [our guide](https://fixml.readthedocs.io/en/latest/install_devel_build.html)
on installing the development build of FixML on your system.

To do this, ensure you have Miniconda/Anaconda installed on your system. You can
download miniconda
on [their official website](https://docs.anaconda.com/miniconda/).


1. Clone this repository from GitHub:
```bash
git clone [email protected]:UBC-MDS/fixml.git
```

2. Create a conda environment:

```bash
conda env create -f environment.yml
```

3. Activate the newly created conda environment (default name `fixml`):

```bash
conda activate fixml
```

4. Use `poetry` which is preinstalled in the conda environment to create a local package install:

```bash
poetry install
```

5. Use `git` (or similar) to create a branch for local development and make your changes:
2. Use `git` (or similar) to create a branch for local development and make your changes:

```console
git checkout -b name-of-your-bugfix-or-feature
```

6. When you're done making changes, check that your changes conform to any code formatting requirements and pass any tests.
3. When you're done making changes, check that your changes conform to any code formatting requirements and pass any tests.

7. Commit your changes and open a pull request.
4. Commit your changes and open a pull request.

## Pull Request Guidelines

Expand Down
171 changes: 33 additions & 138 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,177 +16,72 @@
A tool for providing context-aware evaluations using a checklist-based approach
on the Machine Learning project code bases.

## Motivation

Testing codes in Machine Learning project mostly revolves around ensuring the
findings are reproducible. To achieve this, currently it requires a lot of
manual efforts. It is because such projects usually have assumptions that are
hard to quantify in traditional software engineering approach i.e. code
coverage. One such example would be testing the model's performance, which will
not result in any errors, but we do expect this result to be reproducible by
others. Testing such codes, therefore, require us to not only quantitatively,
but also to qualitatively gauge how effective the tests are.

A common way to handle this currently is to utilize expertise from domain
experts in this area. Researches and guidelines have been done on how to
incorporate such knowledge through the use of checklists. However, this requires
manually validating the checklist items which usually results in poor
scalability and slow feedback loop for developers, which are incompatible with
today's fast-paced, competitive landscape in ML developments.

This tool aims to bridge the gap between these two different approaches, by
adding Large Language Models (LLMs) into the loop, given LLMs' recent
advancement in multiple areas including NLU tasks and code-related tasks. They
have been shown to some degrees the ability to analyze codes and to produce
context-aware suggestions. This tool simplifies such workflow by providing a
command line tool as well as a high-level API for developers and researchers
alike to quickly validate if their tests satisfy common areas that are required
for reproducibility purposes.

Given LLMs' tendency to provide plausible but factually incorrect information,
extensive analyses have been done on ensuring the responses are aligned with
ground truths and human expectations both accurately and consistently. Based on
these analyses, we are also able to continuously refine our prompts and
workflows.
## Documentations

## Installation
- Guides and API documentations: [https://fixml.readthedocs.org](https://fixml.readthedocs.org)
- Reports and proposals: [https://ubc-mds.github.io/fixml](https://ubc-mds.github.io/fixml)

This tool is on PyPI. To install, please run:
## Installation

```bash
pip install fixml
```

## Usage
# For unix-like systems e.g. Linux, macOS
export OPENAI_API_KEY={your-openai-api-key}

### CLI tool
# For windows systems
set OPENAI_API_KEY={your-openai-api-key}
```

Once installed, the tool offers a Command Line Interface (CLI) command `fixml`.
By using this command you will be able to evaluate your project code bases,
generate test function specifications, and perform various relevant tasks.
For more detailed installation guide,
visit [the related page on ReadtheDocs](https://fixml.readthedocs.io/en/latest/installation.html).

Run `fixml --help` for more details.
## Usage

> [!IMPORTANT]
> By default, this tool uses OpenAI's `gpt3.5-turbo` for evaluation. To run any
> command that requires calls to LLM (i.e. `fixml evaluate`, `fixml generate`),
> an environment variable `OPENAI_API_KEY` needs to be set. To do so, either use
`export` to set the variable in your current session, or create a `.env` file
> with a line `OPENAI_API_KEY={your-api-key}` saved in your working directory.
### CLI tool

> [!TIP]
> Currently, only calls to OpenAI endpoints are supported. This tool is still in
> ongoing development and integrations with other service providers and locally
> hosted LLMs are planned.
FixML offers a CLI command to quick and easy way to evaluate existing tests and
generate new ones.

#### Test Evaluator

The test evaluator command is used to evaluate the tests of your repository. It
generates an evaluation report and provides various options for customization,
such as specifying a checklist file, output format, and verbosity.
Here is an example command to evaluate a local repo:

Example calls:
```bash
# Evaluate repo, and output the evalutions as a JSON file in working directory
fixml evaluate /path/to/your/repo

# Perform the above verbosely, and use the JSON file to export a HTML report
fixml evaluate /path/to/your/repo -e ./eval_report.html -v

# Perform the above, but use a custom checklist, and to overwrite existing report
fixml evaluate /path/to/your/repo -e ./eval_report.html -v -o -c checklist/checklist.csv

# Perform the above, and to use gpt-4o as the evaluation model
fixml evaluate /path/to/your/repo -e ./eval_report.html -v -o -c checklist/checklist.csv -m gpt-4o
fixml evaluate /path/to/your/repo \
--export_report_to=./eval_report.html --verbose
```

#### Test Spec Generator

The test spec generator command is used to generate a test specification from a
checklist. It allows for the inclusion of an optional checklist file to guide
the test specification generation process.

Example calls:
Here is an example command to evaluate a local repo
```bash
# Generate test function specifications and to write them into a .py file
fixml generate test.py

# Perform the above, but to use a custom checklist
fixml generate test.py -c checklist/checklist.csv
```

### Package
> [!TIP]
> Run command `fixml {evaluate|generate} --help` for more information and all
> available options.
>
> You can also refer
> to [our Quickstart guide](https://fixml.readthedocs.io/en/latest/quickstart.html)
> on more detailed walkthrough on how to use the CLI tool.
Alternatively, you can use the package to import all components necessary for running the evaluation/generation workflows listed above.
### Package

The workflows used in the package have been designed to be fully modular. You
can easily switch between different prompts, models and checklists to use. You
can also write your own custom classes to extend the capability of this library.
Alternatively, you can use the package to import all components necessary for
running the evaluation/generation workflows listed above.

Consult the [API documentation on Readthedocs](https://fixml.readthedocs.io/en/latest/)
Consult [our documentation on using the API](https://fixml.readthedocs.io/en/latest/using-the-api.html)
for more information and example calls.

## Development Build

If you are interested in helping the development of this tool, or you would like
to get the cutting-edge version of this tool, you can install this tool via
conda.

To do this, ensure you have Miniconda/Anaconda installed on your system. You can
download miniconda
on [their official website](https://docs.anaconda.com/miniconda/).
Please refer to [the related page in our documentation](https://fixml.readthedocs.io/en/latest/install_devel_build.html).

## Rendering Documentations

1. Clone this repository from GitHub:
```bash
git clone [email protected]:UBC-MDS/fixml.git
```

2. Create a conda environment:

```bash
conda env create -f environment.yml
```

3. Activate the newly created conda environment (default name `fixml`):

```bash
conda activate fixml
```

4. Use `poetry` which is preinstalled in the conda environment to create a local package install:

```bash
poetry install
```

5. You now should be able to run `fixml`, try:
```bash
fixml --help
```

## Rendering API Documentation

Make sure you have installed dev dependencies listed in `pyproject.toml`.

```bash
cd docs/

python -m sphinx -T -b html -D language=en . _build
```

## Running the Tests

Navigate to the project root directory and use the following command in terminal
to run the test suite:

```bash
# skip integration tests
pytest -m "not integeration"

# run ALL tests, which requires OPENAI_API_KEY to be set
pytest
```
Please refer to [the related page in our documentation](https://fixml.readthedocs.io/en/latest/render.html).

## Contributing

Expand Down
2 changes: 2 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
:hidden:
:caption: Getting Started
motivation.md
installation.md
quickstart.md
using_api.md
Expand All @@ -17,6 +18,7 @@ reliability.md
:hidden:
:caption: Development
install_devel_build.md
contributing.md
conduct.md
render.md
Expand Down
43 changes: 43 additions & 0 deletions docs/install_devel_build.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Install Development Build

If you are interested in helping the development of this tool, or you would like
to get the cutting-edge version of this tool, you can install this tool via
conda.

To do this, ensure you have Miniconda/Anaconda installed on your system. You can
download miniconda
on [their official website](https://docs.anaconda.com/miniconda/).


1. Clone this repository from GitHub:
```bash
git clone [email protected]:UBC-MDS/fixml.git
```

2. Create a conda environment:

```bash
conda env create -f environment.yml
```

3. Activate the newly created conda environment (default name `fixml`):

```bash
conda activate fixml
```

4. Use `poetry` which is preinstalled in the conda environment to create a local package install:

```bash
poetry install
```

5. Done! You should now be able to run unit tests to confirm the build works
without problems:
```bash
# skip integration tests
pytest -m "not integeration"
# run ALL tests, which requires OPENAI_API_KEY to be set
pytest
```
40 changes: 40 additions & 0 deletions docs/motivation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Motivation

## Why another tool for testing tests? Aren't code coverage tools enough?

Testing codes in Machine Learning project mostly revolves around ensuring the
findings are reproducible. To achieve this, currently it requires a lot of
manual efforts. It is because such projects usually have assumptions that are
hard to quantify in traditional software engineering approach i.e. code
coverage. One such example would be testing the model's performance, which will
not result in any errors, but we do expect this result to be reproducible by
others. Testing such codes, therefore, require us to not only quantitatively,
but also to qualitatively gauge how effective the tests are.

## OK, but we can evaluate the tests by looking into the tests by ourselves...

Yes, a common way to handle this currently is to utilize expertise from domain
experts in this area. Researches and guidelines have been done on how to
incorporate such knowledge through the use of checklists. However, this requires
manually validating the checklist items which usually results in poor
scalability and slow feedback loop for developers, which are incompatible with
today's fast-paced, competitive landscape in ML developments.

## So what does this tool offer?

This tool aims to bridge the gap between these two different approaches, by
adding Large Language Models (LLMs) into the loop, given LLMs' recent
advancement in multiple areas including NLU tasks and code-related tasks. They
have been shown to some degrees the ability to analyze codes and to produce
context-aware suggestions. This tool simplifies such workflow by providing a
command line tool as well as a high-level API for developers and researchers
alike to quickly validate if their tests satisfy common areas that are required
for reproducibility purposes.

## LLMs are known for occasional hallucinations. How is this mitigated?

Given LLMs' tendency to provide plausible but factually incorrect information,
extensive analyses have been done on ensuring the responses are aligned with
ground truths and human expectations both accurately and consistently. Based on
these analyses, we are also able to continuously refine our prompts and
workflows.
Loading

0 comments on commit e73926d

Please sign in to comment.