Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DevX: Track the benchmark infra health and usage #8247

Open
guangy10 opened this issue Feb 6, 2025 · 8 comments
Open

DevX: Track the benchmark infra health and usage #8247

guangy10 opened this issue Feb 6, 2025 · 8 comments
Assignees
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: benchmark Issues related to the benchmark infrastructure module: user experience Issues related to reducing friction for users triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Milestone

Comments

@guangy10
Copy link
Contributor

guangy10 commented Feb 6, 2025

Today I'm monitoring the infra health only via the HUD by filtering jobs with "-perf": https://hud.pytorch.org/hud/pytorch/executorch/main/1?per_page=50&name_filter=-perf&mergeLF=true

I'm wondering if there is a better way to monitor the health and with detailed metrics. It could be something like this: https://hud.pytorch.org/metrics, where I can see the historical run and success rate of the benchmark jobs, nightly runs vs. on-demand. High frequent failures, hotspot devices, etc.

cc: @kimishpatel @digantdesai

cc @huydhn @kirklandsign @shoumikhin @mergennachin @byjlw

@guangy10 guangy10 added enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: benchmark Issues related to the benchmark infrastructure module: user experience Issues related to reducing friction for users labels Feb 6, 2025
@github-project-automation github-project-automation bot moved this to To triage in ExecuTorch DevX Feb 6, 2025
@digantdesai digantdesai added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 6, 2025
@mergennachin mergennachin moved this from To triage to Backlog in ExecuTorch DevX Feb 6, 2025
@guangy10 guangy10 moved this to Todo in ExecuTorch Benchmark Feb 6, 2025
@guangy10 guangy10 added this to the 0.6.0 milestone Feb 10, 2025
@huydhn
Copy link
Contributor

huydhn commented Feb 11, 2025

This is something that we can build overtime after we have auto regression detection in place

@byjlw
Copy link
Contributor

byjlw commented Feb 12, 2025

Good question. I'd turn this into a discussion item, and clarify which questions you want the dash to answer and we work from there to figure out what the metrics should be and how and where they are presented.

@guangy10
Copy link
Contributor Author

Good question. I'd turn this into a discussion item, and clarify which questions you want the dash to answer and we work from there to figure out what the metrics should be and how and where they are presented.

@byjlw This task is not to discuss what metrics we want to show in the benchmark dashboard to OSS users. It's mainly for the infra admin/developers to have an easier way to monitor the infra health and usage. If you have thoughts on what to show on dash we can open a new GitHub Issue for it.

@guangy10
Copy link
Contributor Author

Hi @yangw-dev, @huydhn and I have discussed this yesterday. I'd like to minimize the efforts on this task (size M -> S or XS), but focus on enable the auto regression detection and alert with details to debug. See #8239 for details.

For this specific task, I expect to have a better way than monitor the CI health via HUD using this link: https://hud.pytorch.org/hud/pytorch/executorch/main/1?per_page=50&name_filter=-perf&mergeLF=true. The main problems with view on HUD are:

  1. there are many noise commits that don't run the benchmark jobs, but I don't have a way to hide those from the HUD view.
  2. there is no way to track the overall health of benchmark jobs in a certain time window, e.g. what is the pass rate of scheduled run of apple-perf in past 30 days. IIRC your team seems to already have a way to fetch such metrics.

@yangw-dev
Copy link
Contributor

Hi guang, sounds good! I will take a look into tihs

Hi @yangw-dev, @huydhn and I have discussed this yesterday. I'd like to minimize the efforts on this task (size M -> S or XS), but focus on enable the auto regression detection and alert with details to debug. See #8239 for details.

For this specific task, I expect to have a better way than monitor the CI health via HUD using this link: https://hud.pytorch.org/hud/pytorch/executorch/main/1?per_page=50&name_filter=-perf&mergeLF=true. The main problems with view on HUD are:

  1. there are many noise commits that don't run the benchmark jobs, but I don't have a way to hide those from the HUD view.
  2. there is no way to track the overall health of benchmark jobs in a certain time window, e.g. what is the pass rate of scheduled run of apple-perf in past 30 days. IIRC your team seems to already have a way to fetch such metrics.

Hi guang, sounds good! I will take a look into this

@guangy10
Copy link
Contributor Author

Another issue which is a good case to show why monitoring HUD is not efficient. When testing on pytorch/test-infra#6277 @huydhn and I noticed some entries are marked as "0->new number" on the dashboard though the model+benchmark_config passed on both the base and new commits. After digging into it, it turns out that they are actually running on different devices, i.e. one runs on iPhone 15 Plus and one doesn't. Without looking into each individual job (on HUD for example), it's impossible to notice this discrepancy.

Image

@huydhn
Copy link
Contributor

huydhn commented Feb 18, 2025

It’s a good idea to build a feature to allow ExecuTorch to self-manage the device pool going forward. Dev Infra folks can do so using AWS console at https://us-west-2.console.aws.amazon.com/devicefarm/home?region=us-east-1#/mobile/projects/02a2cf0f-6d9b-45ee-ba1a-a086587469e6/settings, so we need to expose this functionality somehow. Let me create a tracking issue for this

@huydhn
Copy link
Contributor

huydhn commented Feb 19, 2025

pytorch/test-infra#6301

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: benchmark Issues related to the benchmark infrastructure module: user experience Issues related to reducing friction for users triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Status: Ready
Status: Backlog
Status: In Progress
Development

No branches or pull requests

5 participants