Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meeting Minutes for Week 5 #99

Closed
18 of 20 tasks
JohnShiuMK opened this issue May 24, 2024 · 4 comments
Closed
18 of 20 tasks

Meeting Minutes for Week 5 #99

JohnShiuMK opened this issue May 24, 2024 · 4 comments
Assignees
Labels
admin meeting related

Comments

@JohnShiuMK
Copy link
Collaborator

JohnShiuMK commented May 24, 2024

Sprint Planning - 2024-05-27 Week 5

Checklist

System

System Evaluation: Consistency

System Evaluation: Accuracy

591 Requirement

@tonyshumlh
Copy link
Collaborator

Mentor Meeting 2024/05/30 - Week 5

  • Have the ground truth of repo evaluated by human
  • Have tools for users to assess how much they should trust the tool (consistency, accuracy)
  • Run multiple runs per repo and plot the box plot of completeness score per checklist item to show consistency per checklist item
  • Plot a histogram of consistency measure (e.g. Standard Deviation) vs number of repo per SD bin to illustrate the consistency for troubleshooting
  • Have a Consistency table with columns repo, run, checklist item 1, ... , N to investigate 1) whether there is a high variation of scores for a checklist item across repo; 2) whether there is a high variation of scores for a repo across checklist items
  • For 1), probably there is checklist item issue that requires improvement; For 2), probably there is structure issue for the repo
  • Add a conclusion whether we/users should trust our result in the Final Presentation based on the above methods
  • Based on the Consistency table, we can drill down to a specific repo or a specific checklist item across all repo for Human Ground Truth investigation and comparison

@tonyshumlh
Copy link
Collaborator

tonyshumlh commented May 30, 2024

Partner Meeting 2024/05/30 - Week 5

Checklist

  • There is Python API for Quarto CLI to call Quarto using Python (doc: Quarto_dev) we can try

Evaluation Report

  • Content Revision:
    • If none of the files satisfying the Checklist Item, just show "None of test function fulfill ..." in the Observations and skip the content in Function References;
    • For partial/full satisfying the Checklist Item, keep only (one/all) the relevant test files, functions and line numbers in Function References and the corresponding Observations content.
    • Observations: (Satisfied/Partial Satisfied/Not Satisfied)
    • Move the hyperlink to Functions and remove the Line Numbers key-value
    • Remove n_file_tested column in DataFrame and put it as a subheader
  • If we can provide learning examples for ChatGPT for standard case, edge case, error handling
  • Future Development: Add the project code base into the Tool to point out which part of the code requires the Not Satisfying Checklist Item

Test Spec Generator

  • Prompt engineering for well docstring format in the test spec generation output
  • (Refer to John note)
  • Build optional function that allow user to extract all docstring of ML system codes and feed into LLM to generate ref test cases that are relevant OR allow user to feed a docstring of their ML system function into LLM to generate ref test cases that are relevant

System Evaluation

  • Change the F-test from overall completeness score to per-checklist completeness score
  • Contributing: Add Acknowledgement (refer to Tiffany's py-pkg Github repo)
  • (Good to have) Think about how Tiffany can perform mutation testing in the future
  • Have a separate presentation with Dr Rohan Alexander in June 24/25th morning
  • Parallelise multiple API calls to speed up

@JohnShiuMK
Copy link
Collaborator Author

JohnShiuMK commented May 31, 2024

Partner Meeting Minutes - May 30, 2024

Attendees: John, Orix, Simon (Mentor), Tiffany (Partner), Tony, Yingzi

Checklist for Leader Persona

  • Consider trying Python API for Quarto CLI for checklist visualization
  • Consider including test examples respectively for standard case, edge case and error handling as a context for GPT (for evaluator and test spec generator)
  • Consider including repo example with ground truth as a context for GPT

System for Researcher Persona

  • To revise the Report output format:
    • For a checklist item, if none of the files satisfy, show "Not Satisfied" / "None of test function fulfilled" in the Observations and skip the content in Function References
      • (future dev) add a hyperlink to go to the part of the code that is relevant to the missing test (require taking project code base as a context)
    • If a checklist item is partially / fully satisfied, show "Partial Satisfied" / "Satisfied", and keep only relevant test files and functions under the Observations and Function References.
    • Move the hyperlink directly to the Functions and remove the Line Numbers
  • To remove Terminal Report display format:
    • Remove n_file_tested column in DataFrame
    • Put it as a subheader, e.g. "N files are tested"
  • To improve prompts for better docstring generation in the test spec generator
    • to include test examples respectively for standard case, edge case and error handling as a context for GPT (defined in the checklist)
    • to provide examples of good quality of test files + providing ground truth (defined in checklist)
    • to define numpy format and provide example/skeleton
    • to add an optional argument: giving all the docstrings of functions in project repo
      • functions docstrings are more a user's responsibility

System Evaluation for Ourselves (System Developer Persona)

  • (Consistency) Show the F-test on the consistencies of per-item score as well
  • (Accuracy) For future users to contribute, to add Acknowledgement (refer to Tiffany's py-pkg Github repo)
  • To consider parallelising multiple API calls to speed up

Others

  • Will have a separate presentation with Dr Rohan Alexander in June 24/25th morning

@JohnShiuMK
Copy link
Collaborator Author

move to #127

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
admin meeting related
Projects
None yet
Development

No branches or pull requests

4 participants