-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter out devices that are not run #6277
Conversation
@huydhn is attempting to deploy a commit to the Meta Open Source Team on Vercel. A member of the Team first needs to authorize it. |
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Thanks for the fix! Case 1 Update: Oh actually case 1 has the same issue as case two where it displays lots of entires as "0->number", but without highlighting it in green, so not obvious. Apple jobs aren't run on the new commit, so all entries with Device Apple should be hide from the view. Case 2 Case 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some issues to further look into
@@ -27,12 +27,22 @@ export const IS_INCREASING_METRIC_VALUE_GOOD: { [k: string]: boolean } = { | |||
flops_utilization: true, | |||
"compilation_time(s)": false, | |||
speedup: true, | |||
"avg_inference_latency(ms)": false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot that we need to update this to highlight the metrics correctly. A better solution is probably need as part of pytorch/executorch#8239
@guangy10 The new version is ready. Please take a look |
Case 4 There is a case where there is no overlapping between the two commits at all, i.e. one run all Android job and the other running all iOS jobs. In this case, I opt to just show the results from the right commit, but without any highlighting. For example: |
Should we show nothing (empty board) instead since there is no overlap? Discovered a new issue: If you toggle the "Platform", the irrelevant entries will be back. |
qq: If a benchmark job passed after retry, would the new result be uploaded and merged to the db? I know the step |
Actually I'm thinking of this UX.
This seems to be the UI improvement that @yangw-dev can help with? If you decide to handle it separately |
No worry, a retry would just upload the the result as usual because the upload is a step in the workflow |
I feel that showing an empty page is just as confusing. And yes, we can default to show all entries from the latest (right) commit in the landing page.
|
Agree, let's do this in a separate PR. |
if we are not doing this in this PR, can you file a GitHub issue for it? |
#6293 captures the gist of it So, here are the cases that have been fixed:
|
Oh, I realize that I have a logic to show only one number if the value from the base and the new commit is the same. I have removed that. So, now whenever there is only one number, it means that only the new commit has the benchmark results. The highlight issue has been fixed, I hope, dashboard. The tweaking here is difficult to get it right because I need to deduce the info there instead of just showing a failed benchmark run. I create an issue for that here #6294 |
In this new revision there are lots of entries marked as "0->number" though the jobs run successfully on both commits. |
If we merge the PR like this, the UX won’t be better as there are many entries are still misleading. We should probably narrow this PR to only fix what can be fixed independent from the saving failed runs to the DB. Then we should think about how we can split the issue, what you can help with and what Yang can take over. |
To sum up what have been fixed by this PR are:
This ensures that if anything is highlighted on the dashboard, they are legit. What has not yet been fixed is that CI failures won't show up on the dashboard as there is currently no way to differentiate between having no data because of CI failure v.s. having no data because of benchmark not running. This will be covered next by #6294. |
Yeah, it sounds a concrete fix that we can done without having #6294 and #6293. Just to clarify, when we compare the devices, it includes both the device name and the os version, both must match exactly? |
Can we loose the device match to only cover the major OS version instead? That is, "iPhone 15 Plus iOS 17.4" on the left commit will be considered as a match with "iPhone Plus iOS 17.2" on the right commit. |
Done, you can see that the matching conditions https://github.com/pytorch/test-infra/pull/6277/files#diff-8b946f720239823df7504798036de4008e6028f8961d7a1d04de6969d9b784b7R112 are:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go! 🚀
Fixes pytorch/executorch#7986 (for real this time)
The problem is that the records in the benchmark database alone don't have the information to differentiate between benchmarks that are failed to run and benchmarks that are not run. Both show up as 0 on the dashboard. Note that we can do a join with
workflow_job
table to get this information, but it's a rather slow and expensive route. So, I opt for a quicker approach to keep track of the list of valid devices on the dashboard side. A valid device is one that is run by the selected commit and has at least one non-zero record there.Testing
https://torchci-git-fork-huydhn-better-model-filter-fbopensource.vercel.app/benchmark/llms?repoName=pytorch%2Fexecutorch