Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add credits to ubc and university of wisconsin in readme #193

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,12 @@ data/processed/ground_truth.csv : analysis/preprocess_batch_run_result.py data/b

# Build 'report/docs/index.html' by rendering the Jupyter notebooks using Quarto.
report/docs/index.html : data/processed/ground_truth.csv
quarto render
quarto render --cache-refresh
awk '{gsub(/proposal/,"final_report"); print}' ./report/docs/index.html > tmp && mv tmp ./report/docs/index.html

.PHONY : publish
publish : data/processed/ground_truth.csv
quarto publish gh-pages
quarto publish gh-pages ./report

# The 'clean' target is used to clean up generated files and directories.
.PHONY : clean
Expand All @@ -35,4 +36,5 @@ clean :
rm -rf data/batch_run/batch_run_4o
rm -rf data/processed/ground_truth.csv
rm -rf data/processed/score_*csv
rm -rf report/.quarto/

16 changes: 11 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,15 +63,15 @@ Run `fixml --help` for more details.

> [!IMPORTANT]
> By default, this tool uses OpenAI's `gpt3.5-turbo` for evaluation. To run any
command that requires calls to LLM (i.e. `fixml evaluate`, `fixml generate`),
an environment variable `OPENAI_API_KEY` needs to be set. To do so, either use
> command that requires calls to LLM (i.e. `fixml evaluate`, `fixml generate`),
> an environment variable `OPENAI_API_KEY` needs to be set. To do so, either use
`export` to set the variable in your current session, or create a `.env` file
with a line `OPENAI_API_KEY={your-api-key}` saved in your working directory.
> with a line `OPENAI_API_KEY={your-api-key}` saved in your working directory.

> [!TIP]
> Currently, only calls to OpenAI endpoints are supported. This tool is still in
ongoing development and integrations with other service providers and locally
hosted LLMs are planned.
> ongoing development and integrations with other service providers and locally
> hosted LLMs are planned.

#### Test Evaluator

Expand Down Expand Up @@ -184,6 +184,7 @@ deliverable product during our capstone project of the UBC-MDS program in
collaboration with Dr. Tiffany Timbers and Dr. Simon Goring. It is licensed
under the terms of the MIT license for software code. Reports and instructional
materials are licensed under the terms of the CC-BY 4.0 license.

## Citation

If you use fixml in your work, please cite:
Expand All @@ -206,3 +207,8 @@ welcome it to be read, revised, and supported by data scientists, machine
learning engineers, educators, practitioners, and hobbyists alike. Your
contributions and feedback are invaluable in making this package a reliable
resource for the community.

Special thanks to the University of British Columbia (UBC) and the University of
Wisconsin-Madison for their support and resources. We extend our gratitude to
Dr. Tiffany Timbers and Dr. Simon Goringfor their guidance and expertise, which
have been instrumental in the development of this project.
6 changes: 4 additions & 2 deletions _quarto.yml
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
project:
type: website
render:
- "report/*qmd"
output-dir: report/docs
- report/final_report.qmd
- report/proposal.qmd
output-dir: "report/docs/"

website:
title: "FixML - Checklists and LLM prompts for efficient and effective test creation in data analysis"
sidebar:
style: "docked"
logo: "img/logo.png"
Expand Down
3 changes: 1 addition & 2 deletions report/final_report.qmd
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
---
title: "Final Report - Checklists and LLM prompts for efficient and effective test creation in data analysis"
format:
html:
code-fold: true
bibliography: references.bib
---

# Final Report - Checklists and LLM prompts for efficient and effective test creation in data analysis

by John Shiu, Orix Au Yeung, Tony Shum, Yingzi Jin

## Executive Summary
Expand Down
5 changes: 2 additions & 3 deletions report/proposal.qmd
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
---
title: "Proposal Report - Checklists and LLM prompts for efficient and effective test creation in data analysis"
format:
html:
code-fold: true
bibliography: references.bib
jupyter: python3
---

# Proposal Report - Checklists and LLM prompts for efficient and effective test creation in data analysis

by John Shiu, Orix Au Yeung, Tony Shum, Yingzi Jin

## Executive Summary
Expand Down Expand Up @@ -36,7 +35,7 @@ We propose to develop testing suites diagnostic tools based on Large Language Mo

Our solution offers an end-to-end application for evaluating and enhancing the robustness of users' ML systems.

![Main components and workflow of the proposed system. The checklist would be written in [YAML](https://yaml.org/) to maximize readability for both humans and machines. We hope this will encourage researchers/users to read, understand and modify the checklist items, while keeping the checklist closely integrated with other components in our system.](../img/proposed_system_overview.png)
![Main components and workflow of the proposed system. The checklist would be written in [YAML](https://yaml.org/) to maximize readability for both humans and machines. We hope this will encourage researchers/users to read, understand and modify the checklist items, while keeping the checklist closely integrated with other components in our system.](../img/proposed_system_overview.png){.lightbox}

One big challenge in utilizing LLMs to reliably and consistently evaluate ML systems is their tendency to generate illogical and/or factually wrong information known as hallucination [@zhang2023sirens].

Expand Down
Loading