DevX: Track the benchmark infra health and usage #8247

guangy10 · 2025-02-06T01:40:32Z

Today I'm monitoring the infra health only via the HUD by filtering jobs with "-perf": https://hud.pytorch.org/hud/pytorch/executorch/main/1?per_page=50&name_filter=-perf&mergeLF=true

I'm wondering if there is a better way to monitor the health and with detailed metrics. It could be something like this: https://hud.pytorch.org/metrics, where I can see the historical run and success rate of the benchmark jobs, nightly runs vs. on-demand. High frequent failures, hotspot devices, etc.

cc: @kimishpatel @digantdesai

cc @huydhn @kirklandsign @shoumikhin @mergennachin @byjlw

huydhn · 2025-02-11T21:55:41Z

This is something that we can build overtime after we have auto regression detection in place

byjlw · 2025-02-12T19:35:04Z

Good question. I'd turn this into a discussion item, and clarify which questions you want the dash to answer and we work from there to figure out what the metrics should be and how and where they are presented.

guangy10 · 2025-02-13T01:50:59Z

Good question. I'd turn this into a discussion item, and clarify which questions you want the dash to answer and we work from there to figure out what the metrics should be and how and where they are presented.

@byjlw This task is not to discuss what metrics we want to show in the benchmark dashboard to OSS users. It's mainly for the infra admin/developers to have an easier way to monitor the infra health and usage. If you have thoughts on what to show on dash we can open a new GitHub Issue for it.

guangy10 · 2025-02-13T02:03:14Z

Hi @yangw-dev, @huydhn and I have discussed this yesterday. I'd like to minimize the efforts on this task (size M -> S or XS), but focus on enable the auto regression detection and alert with details to debug. See #8239 for details.

For this specific task, I expect to have a better way than monitor the CI health via HUD using this link: https://hud.pytorch.org/hud/pytorch/executorch/main/1?per_page=50&name_filter=-perf&mergeLF=true. The main problems with view on HUD are:

there are many noise commits that don't run the benchmark jobs, but I don't have a way to hide those from the HUD view.
there is no way to track the overall health of benchmark jobs in a certain time window, e.g. what is the pass rate of scheduled run of apple-perf in past 30 days. IIRC your team seems to already have a way to fetch such metrics.

yangw-dev · 2025-02-14T20:33:45Z

Hi guang, sounds good! I will take a look into tihs

Hi @yangw-dev, @huydhn and I have discussed this yesterday. I'd like to minimize the efforts on this task (size M -> S or XS), but focus on enable the auto regression detection and alert with details to debug. See #8239 for details.

For this specific task, I expect to have a better way than monitor the CI health via HUD using this link: https://hud.pytorch.org/hud/pytorch/executorch/main/1?per_page=50&name_filter=-perf&mergeLF=true. The main problems with view on HUD are:

there are many noise commits that don't run the benchmark jobs, but I don't have a way to hide those from the HUD view.

there is no way to track the overall health of benchmark jobs in a certain time window, e.g. what is the pass rate of scheduled run of apple-perf in past 30 days. IIRC your team seems to already have a way to fetch such metrics.

Hi guang, sounds good! I will take a look into this

guangy10 · 2025-02-18T19:31:20Z

Another issue which is a good case to show why monitoring HUD is not efficient. When testing on pytorch/test-infra#6277 @huydhn and I noticed some entries are marked as "0->new number" on the dashboard though the model+benchmark_config passed on both the base and new commits. After digging into it, it turns out that they are actually running on different devices, i.e. one runs on iPhone 15 Plus and one doesn't. Without looking into each individual job (on HUD for example), it's impossible to notice this discrepancy.

huydhn · 2025-02-18T19:35:16Z

It’s a good idea to build a feature to allow ExecuTorch to self-manage the device pool going forward. Dev Infra folks can do so using AWS console at https://us-west-2.console.aws.amazon.com/devicefarm/home?region=us-east-1#/mobile/projects/02a2cf0f-6d9b-45ee-ba1a-a086587469e6/settings, so we need to expose this functionality somehow. Let me create a tracking issue for this

huydhn · 2025-02-19T02:31:09Z

pytorch/test-infra#6301

guangy10 added enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: benchmark Issues related to the benchmark infrastructure module: user experience Issues related to reducing friction for users labels Feb 6, 2025

guangy10 assigned huydhn Feb 6, 2025

github-project-automation bot moved this to To triage in ExecuTorch DevX Feb 6, 2025

github-project-automation bot added this to ExecuTorch DevX Feb 6, 2025

digantdesai added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 6, 2025

mergennachin moved this from To triage to Backlog in ExecuTorch DevX Feb 6, 2025

guangy10 added this to ExecuTorch Benchmark Feb 6, 2025

guangy10 moved this to Todo in ExecuTorch Benchmark Feb 6, 2025

huydhn added this to PyTorch OSS Dev Infra Feb 7, 2025

guangy10 added this to the 0.6.0 milestone Feb 10, 2025

huydhn assigned yangw-dev Feb 10, 2025

guangy10 mentioned this issue Feb 11, 2025

Support auto regression detection and bisect #8239

Open

github-actions bot mentioned this issue Feb 11, 2025

Weekly issue metrics report - 2025-02-01..2025-02-07 wdvr/pytorch#5

Open

github-actions bot mentioned this issue Feb 17, 2025

Weekly issue metrics report - 2025-02-01..2025-02-07 wdvr/pytorch#7

Open

huydhn moved this to Cold Storage in PyTorch OSS Dev Infra Feb 17, 2025

github-actions bot mentioned this issue Feb 24, 2025

Weekly issue metrics report - 2025-02-01..2025-02-07 wdvr/pytorch#9

Open

yangw-dev moved this from Cold Storage to In Progress in PyTorch OSS Dev Infra Feb 26, 2025

huydhn removed their assignment Feb 27, 2025

github-actions bot mentioned this issue Mar 3, 2025

Weekly issue metrics report - 2025-02-01..2025-02-07 wdvr/pytorch#13

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DevX: Track the benchmark infra health and usage #8247

DevX: Track the benchmark infra health and usage #8247

guangy10 commented Feb 6, 2025 •

edited by pytorch-bot bot

Loading

huydhn commented Feb 11, 2025

byjlw commented Feb 12, 2025

guangy10 commented Feb 13, 2025

guangy10 commented Feb 13, 2025

yangw-dev commented Feb 14, 2025

guangy10 commented Feb 18, 2025

huydhn commented Feb 18, 2025

huydhn commented Feb 19, 2025

DevX: Track the benchmark infra health and usage #8247

DevX: Track the benchmark infra health and usage #8247

Comments

guangy10 commented Feb 6, 2025 • edited by pytorch-bot bot Loading

huydhn commented Feb 11, 2025

byjlw commented Feb 12, 2025

guangy10 commented Feb 13, 2025

guangy10 commented Feb 13, 2025

yangw-dev commented Feb 14, 2025

guangy10 commented Feb 18, 2025

huydhn commented Feb 18, 2025

huydhn commented Feb 19, 2025

guangy10 commented Feb 6, 2025 •

edited by pytorch-bot bot

Loading