Skip to content

Commit

Permalink
Merge branch 'cont-revise-report' of github.com:UBC-MDS/test-creation…
Browse files Browse the repository at this point in the history
… into cont-revise-report
  • Loading branch information
John Shiu committed Jun 21, 2024
2 parents ba97433 + 6f6f81c commit cfc1d17
Show file tree
Hide file tree
Showing 34 changed files with 1,524 additions and 9,060 deletions.
2,241 changes: 0 additions & 2,241 deletions docs/02_finding-report.html

This file was deleted.

1,036 changes: 0 additions & 1,036 deletions docs/04_plots-for-presentations.html

This file was deleted.

881 changes: 763 additions & 118 deletions docs/final_report.html

Large diffs are not rendered by default.

File renamed without changes
File renamed without changes
Binary file added docs/img/proposed_system_overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
File renamed without changes
134 changes: 69 additions & 65 deletions docs/proposal.html

Large diffs are not rendered by default.

89 changes: 34 additions & 55 deletions docs/search.json

Large diffs are not rendered by default.

14 changes: 7 additions & 7 deletions report/final_report/_quarto.yml
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
project:
type: website
render:
- "*qmd"
output-dir: docs

website:
sidebar:
style: "docked"
logo: "logo.png"
logo: "img/logo.png"
search: true
contents:
- section: "Final Report"
contents:
- final_report.qmd
- section: "Proposal"
contents:
- proposal.ipynb
- text: "Capstone Final Report"
href: final_report.qmd
- text: "Capstone Proposal"
href: proposal.qmd

format:
html:
Expand Down
606 changes: 0 additions & 606 deletions report/final_report/docs/01_preprocess.html

This file was deleted.

2,241 changes: 0 additions & 2,241 deletions report/final_report/docs/02_finding-report.html

This file was deleted.

931 changes: 0 additions & 931 deletions report/final_report/docs/02_plots-for-final-report.html

This file was deleted.

1,036 changes: 0 additions & 1,036 deletions report/final_report/docs/04_plots-for-presentations.html

This file was deleted.

315 changes: 167 additions & 148 deletions report/final_report/docs/final_report.html

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions report/final_report/docs/index.html
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Redirect to 02_plots-for-final-report.html</title>
<meta http-equiv="refresh" content="0;URL='02_plots-for-final-report.html'" />
<title>Redirect to final_report.html</title>
<meta http-equiv="refresh" content="0;URL='final_report.html'" />
</head>
<body>
</body>
Expand Down
79 changes: 22 additions & 57 deletions report/final_report/docs/proposal.html
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">


<title>Proposal Report - Checklists and LLM prompts for efficient and effective test creation in data analysis</title>
<title>proposal</title>
<style>
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
Expand Down Expand Up @@ -92,7 +92,7 @@
<button type="button" class="quarto-btn-toggle btn" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
<i class="bi bi-layout-text-sidebar-reverse"></i>
</button>
<nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="./proposal.html">Proposal</a></li><li class="breadcrumb-item"><a href="./proposal.html">Proposal Report - Checklists and LLM prompts for efficient and effective test creation in data analysis</a></li></ol></nav>
<nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="./proposal.html">Capstone Proposal</a></li></ol></nav>
<a class="flex-grow-1" role="button" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
</a>
<button type="button" class="btn quarto-search-button" aria-label="" onclick="window.quartoOpenSearch();">
Expand All @@ -107,7 +107,7 @@
<nav id="quarto-sidebar" class="sidebar collapse collapse-horizontal sidebar-navigation docked overflow-auto">
<div class="pt-lg-2 mt-2 text-left sidebar-header">
<a href="./index.html" class="sidebar-logo-link">
<img src="./logo.png" alt="" class="sidebar-logo py-0 d-lg-inline d-none">
<img src="./img/logo.png" alt="" class="sidebar-logo py-0 d-lg-inline d-none">
</a>
</div>
<div class="mt-2 flex-shrink-0 align-items-center">
Expand All @@ -117,40 +117,18 @@
</div>
<div class="sidebar-menu-container">
<ul class="list-unstyled mt-1">
<li class="sidebar-item sidebar-item-section">
<div class="sidebar-item-container">
<a class="sidebar-item-text sidebar-link text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-1" aria-expanded="true">
<span class="menu-text">Final Report</span></a>
<a class="sidebar-item-toggle text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-1" aria-expanded="true" aria-label="Toggle section">
<i class="bi bi-chevron-right ms-2"></i>
</a>
</div>
<ul id="quarto-sidebar-section-1" class="collapse list-unstyled sidebar-section depth1 show">
<li class="sidebar-item">
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./final_report.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">DSCI591 Capstone Final Report</span></a>
<span class="menu-text">Capstone Final Report</span></a>
</div>
</li>
</ul>
</li>
<li class="sidebar-item sidebar-item-section">
<div class="sidebar-item-container">
<a class="sidebar-item-text sidebar-link text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-2" aria-expanded="true">
<span class="menu-text">Proposal</span></a>
<a class="sidebar-item-toggle text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-2" aria-expanded="true" aria-label="Toggle section">
<i class="bi bi-chevron-right ms-2"></i>
</a>
</div>
<ul id="quarto-sidebar-section-2" class="collapse list-unstyled sidebar-section depth1 show">
<li class="sidebar-item">
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./proposal.html" class="sidebar-item-text sidebar-link active">
<span class="menu-text">Proposal Report - Checklists and LLM prompts for efficient and effective test creation in data analysis</span></a>
<span class="menu-text">Capstone Proposal</span></a>
</div>
</li>
</ul>
</li>
</ul>
</div>
</nav>
Expand Down Expand Up @@ -221,32 +199,17 @@ <h3 class="anchored" data-anchor-id="our-objectives">Our Objectives</h3>
<section id="our-product" class="level2">
<h2 class="anchored" data-anchor-id="our-product">Our Product</h2>
<p>Our solution offers an end-to-end application for evaluating and enhancing the robustness of users’ ML systems.</p>
<table class="table">
<thead>
<tr class="header">
<th>```ngsvnkff ../../img/proposed_system_overview.png</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>name: overview-diagram</td>
</tr>
</tbody>
</table>
<p>Main components and workflow of the proposed system. The checklist would be written in <a href="https://yaml.org/">YAML</a> to maximize readability for both humans and machines. We hope this will encourage researchers/users to read, understand and modify the checklist items, while keeping the checklist closely integrated with other components in our system.</p>
<pre><code>
One big challenge in utilizing LLMs to reliably and consistently evaluate ML systems is their tendency to generate illogical and/or factually wrong information known as hallucination [@zhang2023sirens].

To combat this, the proposed system will incorporate a checklist ([Fig. 1](overview-diagram)) which would be curated manually and incorporate best practices in software testing and identified areas to be tested inside ML pipeline from human experts and past research.

This checklist will be our basis in evaluating the effectiveness and completeness of existing tests in a given codebase. Relevant information will be injected into a prompt template, which the LLMs would then be prompted to follow the checklist **exactly** during the evaluation.

Here is an example of how the proposed checklist would be structured:


```{toggle}
```yaml
%YAML 1.2
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="img/proposed_system_overview.png" class="img-fluid figure-img"></p>
<figcaption class="figure-caption">Main components and workflow of the proposed system. The checklist would be written in <a href="https://yaml.org/">YAML</a> to maximize readability for both humans and machines. We hope this will encourage researchers/users to read, understand and modify the checklist items, while keeping the checklist closely integrated with other components in our system.</figcaption>
</figure>
</div>
<p>One big challenge in utilizing LLMs to reliably and consistently evaluate ML systems is their tendency to generate illogical and/or factually wrong information known as hallucination <span class="citation" data-cites="zhang2023sirens">(<a href="#ref-zhang2023sirens" role="doc-biblioref">Zhang et al. 2023</a>)</span>.</p>
<p>To combat this, the proposed system will incorporate a checklist (<a href="overview-diagram">Fig. 1</a>) which would be curated manually and incorporate best practices in software testing and identified areas to be tested inside ML pipeline from human experts and past research.</p>
<p>This checklist will be our basis in evaluating the effectiveness and completeness of existing tests in a given codebase. Relevant information will be injected into a prompt template, which the LLMs would then be prompted to follow the checklist <strong>exactly</strong> during the evaluation.</p>
<p>Here is an example of how the proposed checklist would be structured:</p>
<pre class="{yaml}"><code>%YAML 1.2
---
Title: Checklist for Tests in Machine Learning Projects
Description: &gt;
Expand Down Expand Up @@ -457,8 +420,7 @@ <h2 class="anchored" data-anchor-id="delivery-timeline">Delivery Timeline</h2>
</table>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<pre class="{bibliography}"><code></code></pre>




Expand All @@ -483,6 +445,9 @@ <h2 class="anchored" data-anchor-id="references">References</h2>
<div id="ref-wattanakriengkrai2022github" class="csl-entry" role="listitem">
Wattanakriengkrai, Supatsara, Bodin Chinthanet, Hideaki Hata, Raula Gaikovina Kula, Christoph Treude, Jin Guo, and Kenichi Matsumoto. 2022. <span>“GitHub Repositories with Links to Academic Papers: Public Access, Traceability, and Evolution.”</span> <em>Journal of Systems and Software</em> 183: 111117.
</div>
<div id="ref-zhang2023sirens" class="csl-entry" role="listitem">
Zhang, Yue, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, et al. 2023. <span>“Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models.”</span> <a href="https://arxiv.org/abs/2309.01219">https://arxiv.org/abs/2309.01219</a>.
</div>
</div></section></div></main> <!-- /main -->
<script id="quarto-html-after-body" type="application/javascript">
window.document.addEventListener("DOMContentLoaded", function (event) {
Expand Down
117 changes: 41 additions & 76 deletions report/final_report/docs/search.json

Large diffs are not rendered by default.

Loading

0 comments on commit cfc1d17

Please sign in to comment.