quantify consistency improvement #93

JohnShiuMK · 2024-05-22T19:52:46Z

close #76

… of TestEvaluator (the code is not cleaned)

tonyshumlh · 2024-05-23T01:13:50Z

@JohnShiuMK I add my 2 cents here.
Although the result shall be the same after the adjustment, below are some suggested adjustments

In the example, I notice that var_curr is indeed smaller than var_prev, meaning the F = var_prev / var_curr, and thus p_value should be = 1 - scipy.stats.f.cdf(F, df_prev, df_curr). Therefore, you might want to make the code flexible.

F = var_curr / var_prev if var_prev < var_curr else var_prev / var_curr
...
p_value = 1 - scipy.stats.f.cdf(F, df_curr, df_prev)

I recommend doing two-tailed test rather than one-tailed test. In the example, H_0 is var_prev <= var_curr and H_A is var_prev > var_curr, and the p_value is 0.33 which failed to reject H_A.
However, our point of interest should be whether var_curr < var_prev. Therefore, I recommend H_0: var_prev = var_curr and H_A: var_prev != var_curr. If sample var_curr < sample var_prev and the result is significant, that implies population var_curr < population var_prev

JohnShiuMK · 2024-05-23T06:51:36Z

@JohnShiuMK I add my 2 cents here. Although the result shall be the same after the adjustment, below are some suggested adjustments

In the example, I notice that var_curr is indeed smaller than var_prev, meaning the F = var_prev / var_curr, and thus p_value should be = 1 - scipy.stats.f.cdf(F, df_prev, df_curr). Therefore, you might want to make the code flexible.
F = var_curr / var_prev if var_prev < var_curr else var_prev / var_curr
...
p_value = 1 - scipy.stats.f.cdf(F, df_curr, df_prev)
I recommend doing two-tailed test rather than one-tailed test. In the example, H_0 is var_prev <= var_curr and H_A is var_prev > var_curr, and the p_value is 0.33 which failed to reject H_A.
However, our point of interest should be whether var_curr < var_prev. Therefore, I recommend H_0: var_prev = var_curr and H_A: var_prev != var_curr. If sample var_curr < sample var_prev and the result is significant, that implies population var_curr < population var_prev

wait, i thought the F is already defined in the way you specified?

JohnShiuMK · 2024-05-25T06:02:23Z

According to #72 and #99
I will update the consistency F-test from one-tail to two-tail (keep it in jupyter notebook)

(optional) to look at stat package if have time

JohnShiuMK · 2024-05-27T16:48:53Z

@tonyshumlh I have updated the demo to 2-tail test, ready to review and merge, thanks

JohnShiuMK · 2024-05-27T19:30:08Z

@tonyshumlh I have updated the demo to 2-tail test, ready to review and merge, thanks

hold on, the file / code structure are actually ugly. Let me tidy it up a little bit first, sorry.

JohnShiuMK · 2024-05-28T00:20:37Z

Hi @SoloSynth1 @tonyshumlh

In this demo of F-score comparison, I'm comparing the code of week 3 (before refactoring, i.e. old code base) vs. week 4 (after refactoring). Therefore, I have to keep the old code base (archive/analyze.py) and adjust the ConsistencyEvaluator (archive/llm_eval/consistency_eval.py) so that it also works for the old code.

We may delete them in the future when we are having a comparison between newer versions. But, for now, I guess it's better to keep a record of the above comparison, in case someone asks for it.

In order not to disturb the latest code base, I put those related to the demo and the old code base under archive/.

What do you think? Do you have any better ways to proceed whenever we encounter a situation like this?

…MDS/test-creation into 76-quantify-consistency-improvement

JohnShiuMK · 2024-05-28T18:59:32Z

I have added a note here, to avoid confusion of the archive/

I think we can merge for now.
And if we need to clean up later, we could create another PR, what do you think?

tonyshumlh

The 2-tailed f-test p-value calculation looks good to me

merged the checklist for system development

9705cd0

JohnShiuMK self-assigned this May 22, 2024

John Shiu added 3 commits May 22, 2024 16:27

added demo for evaluating the current version and last week's version…

649a152

… of TestEvaluator (the code is not cleaned)

added example of F-test

2452781

added reference

758bb32

JohnShiuMK requested a review from tonyshumlh May 22, 2024 23:36

Merge branch 'main' into 76-quantify-consistency-improvement

588eb4a

fixed degree of freedom in the demo

7ed97ea

change 1-tail test to 2-tail

870dab9

JohnShiuMK marked this pull request as ready for review May 27, 2024 16:48

JohnShiuMK marked this pull request as draft May 27, 2024 19:28

John Shiu added 3 commits May 27, 2024 15:34

cleaned up folders

1cc4f82

updated demo

18b3f07

updated the demo notebook

98ca2e6

JohnShiuMK marked this pull request as ready for review May 28, 2024 00:10

JohnShiuMK requested a review from SoloSynth1 May 28, 2024 00:10

JohnShiuMK and others added 4 commits May 28, 2024 11:29

Merge branch 'main' into 76-quantify-consistency-improvement

48b6665

doc: added README to clarify the purpose of archive folder

f86b3e3

Merge branch '76-quantify-consistency-improvement' of github.com:UBC-…

a49c6bd

…MDS/test-creation into 76-quantify-consistency-improvement

updated demo

299df50

tonyshumlh approved these changes May 28, 2024

View reviewed changes

JohnShiuMK merged commit 577363d into main May 28, 2024

JohnShiuMK deleted the 76-quantify-consistency-improvement branch May 28, 2024 19:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quantify consistency improvement #93

quantify consistency improvement #93

JohnShiuMK commented May 22, 2024

tonyshumlh commented May 23, 2024

JohnShiuMK commented May 23, 2024

JohnShiuMK commented May 25, 2024

JohnShiuMK commented May 27, 2024

JohnShiuMK commented May 27, 2024

JohnShiuMK commented May 28, 2024

JohnShiuMK commented May 28, 2024

tonyshumlh left a comment

quantify consistency improvement #93

quantify consistency improvement #93

Conversation

JohnShiuMK commented May 22, 2024

tonyshumlh commented May 23, 2024

JohnShiuMK commented May 23, 2024

JohnShiuMK commented May 25, 2024

JohnShiuMK commented May 27, 2024

JohnShiuMK commented May 27, 2024

JohnShiuMK commented May 28, 2024

JohnShiuMK commented May 28, 2024

tonyshumlh left a comment

Choose a reason for hiding this comment