-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluating binomial & non-binomial metrics simultaneously with Experiment class & how to categorise metrics by type #106
Comments
Here's an example (that could be put in an example notebook or as a test case for the Experiment class):
Guardrail metrics are specified by providing a NIM, non-inferiority margin as for the Success metrics can be one sided or two sided as specified by |
The way you’d implement the deterioration metrics is to take the same data as you’d use for your main results (like Pelle provided), but flip the preferred direction, not use any NIMs, and use a separate alpha. For example, suppose you have one success metric that should improve and one guardrail metric (with a NIM) for which an increase is a good change. You would then use that as in Pelle’s example and set preferred direction to increase and set the NIM for the guardrail metric. Next, you would make a similar call but set preferred direction to decrease and not set a NIM. You will then test whether any of the two metrics has significantly moved in the wrong direction. In the paper, we also use a different alpha for this test - so it’s using a separate budget. You can also include a sample ratio mismatch test here via the chi-squared test. |
I can't seem to find much info on how to exactly evaluate both binomial and non-binomial metrics at the same time within a dataframe that gets input within the Experiment class.
It seems that, even with the method column specified, that multiple_difference treats it as a binomial metric. You would obviously need different inputs to perform a t-test, so how would I add and specify these columns? If so, how would I indicate these in Experiment?
Likewise, there's a really good paper you posted on your risk-aware product decision framework using multiple metrics - and I've seen mention of success metrics within the repository/q&a - however there's no documentation I could find that indicates how to specify success, deterioration, and guardrail metrics. I did see a method on sample ratio which is a form of a quality metric, so I suspect this has been considered but it's difficult to see how to implement the entire approach.
Do let me know if you need any further information. Thanks for your time!
The text was updated successfully, but these errors were encountered: