diff --git a/Makefile b/Makefile index a3ddf02..3603ab6 100644 --- a/Makefile +++ b/Makefile @@ -19,11 +19,12 @@ data/processed/ground_truth.csv : analysis/preprocess_batch_run_result.py data/b # Build 'report/docs/index.html' by rendering the Jupyter notebooks using Quarto. report/docs/index.html : data/processed/ground_truth.csv - quarto render + quarto render --cache-refresh + awk '{gsub(/proposal/,"final_report"); print}' ./report/docs/index.html > tmp && mv tmp ./report/docs/index.html .PHONY : publish publish : data/processed/ground_truth.csv - quarto publish gh-pages + quarto publish gh-pages ./report # The 'clean' target is used to clean up generated files and directories. .PHONY : clean @@ -35,4 +36,5 @@ clean : rm -rf data/batch_run/batch_run_4o rm -rf data/processed/ground_truth.csv rm -rf data/processed/score_*csv + rm -rf report/.quarto/ diff --git a/README.md b/README.md index a84b493..5c8d465 100644 --- a/README.md +++ b/README.md @@ -68,15 +68,15 @@ Run `fixml --help` for more details. > [!IMPORTANT] > By default, this tool uses OpenAI's `gpt3.5-turbo` for evaluation. To run any -command that requires calls to LLM (i.e. `fixml evaluate`, `fixml generate`), -an environment variable `OPENAI_API_KEY` needs to be set. To do so, either use +> command that requires calls to LLM (i.e. `fixml evaluate`, `fixml generate`), +> an environment variable `OPENAI_API_KEY` needs to be set. To do so, either use `export` to set the variable in your current session, or create a `.env` file -with a line `OPENAI_API_KEY={your-api-key}` saved in your working directory. +> with a line `OPENAI_API_KEY={your-api-key}` saved in your working directory. > [!TIP] > Currently, only calls to OpenAI endpoints are supported. This tool is still in -ongoing development and integrations with other service providers and locally -hosted LLMs are planned. +> ongoing development and integrations with other service providers and locally +> hosted LLMs are planned. #### Test Evaluator @@ -199,6 +199,7 @@ deliverable product during our capstone project of the UBC-MDS program in collaboration with Dr. Tiffany Timbers and Dr. Simon Goring. It is licensed under the terms of the MIT license for software code. Reports and instructional materials are licensed under the terms of the CC-BY 4.0 license. + ## Citation If you use fixml in your work, please cite: @@ -221,3 +222,8 @@ welcome it to be read, revised, and supported by data scientists, machine learning engineers, educators, practitioners, and hobbyists alike. Your contributions and feedback are invaluable in making this package a reliable resource for the community. + +Special thanks to the University of British Columbia (UBC) and the University of +Wisconsin-Madison for their support and resources. We extend our gratitude to +Dr. Tiffany Timbers and Dr. Simon Goringfor their guidance and expertise, which +have been instrumental in the development of this project. diff --git a/_quarto.yml b/_quarto.yml index 27e5e3a..e53f449 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -1,10 +1,12 @@ project: type: website render: - - "report/*qmd" - output-dir: report/docs + - report/final_report.qmd + - report/proposal.qmd + output-dir: "report/docs/" website: + title: "FixML - Checklists and LLM prompts for efficient and effective test creation in data analysis" sidebar: style: "docked" logo: "img/logo.png" diff --git a/report/final_report.qmd b/report/final_report.qmd index 7b0188d..acc5609 100644 --- a/report/final_report.qmd +++ b/report/final_report.qmd @@ -1,12 +1,11 @@ --- +title: "Final Report - Checklists and LLM prompts for efficient and effective test creation in data analysis" format: html: code-fold: true bibliography: references.bib --- -# Final Report - Checklists and LLM prompts for efficient and effective test creation in data analysis - by John Shiu, Orix Au Yeung, Tony Shum, Yingzi Jin ## Executive Summary diff --git a/report/proposal.qmd b/report/proposal.qmd index 606dca1..75ccbac 100644 --- a/report/proposal.qmd +++ b/report/proposal.qmd @@ -1,4 +1,5 @@ --- +title: "Proposal Report - Checklists and LLM prompts for efficient and effective test creation in data analysis" format: html: code-fold: true @@ -6,8 +7,6 @@ bibliography: references.bib jupyter: python3 --- -# Proposal Report - Checklists and LLM prompts for efficient and effective test creation in data analysis - by John Shiu, Orix Au Yeung, Tony Shum, Yingzi Jin ## Executive Summary @@ -36,7 +35,7 @@ We propose to develop testing suites diagnostic tools based on Large Language Mo Our solution offers an end-to-end application for evaluating and enhancing the robustness of users' ML systems. -![Main components and workflow of the proposed system. The checklist would be written in [YAML](https://yaml.org/) to maximize readability for both humans and machines. We hope this will encourage researchers/users to read, understand and modify the checklist items, while keeping the checklist closely integrated with other components in our system.](../img/proposed_system_overview.png) +![Main components and workflow of the proposed system. The checklist would be written in [YAML](https://yaml.org/) to maximize readability for both humans and machines. We hope this will encourage researchers/users to read, understand and modify the checklist items, while keeping the checklist closely integrated with other components in our system.](../img/proposed_system_overview.png){.lightbox} One big challenge in utilizing LLMs to reliably and consistently evaluate ML systems is their tendency to generate illogical and/or factually wrong information known as hallucination [@zhang2023sirens].