New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[01-06] 以利益相关者需求为驱动的构建开源benchmark方法 - 以Issue分配任务为例 #323

Open

zhingoll opened this issue Jan 4, 2025 · 0 comments

zhingoll commented Jan 4, 2025 •

edited

Loading

Description

Description

汇报人：张震

本次分享包含：

开源研究中 Benchmark 的设计方法 - 以利益相关者需求为驱动
根据该方法实现的一个案例demo - 为仓库中Issue分配合适的解决者

以利益相关者需求为驱动的构建开源benchmark方法

1. 背景与挑战

Benchmark 是连接理论与实践的重要桥梁，常用于评估方法的性能与价值。
主要挑战：
- 数据历史性：静态benchmark无法反映当前生态需求[1][2]。
- 利益相关者需求：不同群体对评估的期望不同[3]。

2. 提出的框架

以利益相关者需求为驱动，设计动态且具有现实指导意义的 Benchmark：

需求识别：识别关键利益相关者及其需求。
任务转化与基准实施：将需求转化为具体任务并设计标准化模拟环境。
动态数据集维护：保持 Benchmark 与最新生态一致。
评估功能设计：结合仿真实验与真实反馈，反映方法实际价值。
结果共享与讨论：促进 Benchmark 持续优化。
基准测试即服务 (BaaS)：工具化服务提供，支持实时反馈与互动。

3. 优势与意义

通过嵌入数据集的持续动态更新和利益相关者的需求，实现 Benchmark 从实际需求中来，到实际需求中去。
打通OpenDigger、OpenPerf和HyperCRX。

4. 案例分享

Issue-assigner-BaaS

相关文献

1、GitBug-Actions: Building Reproducible Bug-Fix Benchmarks with GitHub Actions - ICSE 2024
2、AI and the everything in the whole wide world benchmark - NeurIPS 2021
3、Evaluatology: The science and engineering of evaluation

本次分享旨在探索如何在理论创新与实际应用间找到平衡。

欢迎大家针对方法、实验设计及应用场景展开讨论、批评、指正~

zhingoll changed the title ~~[01-06] 从图对比学习研究到开源 Benchmark 设计的工作分享与思考~~ [01-06] 以利益相关者需求为驱动的构建开源benchmark方法 - 以Issue分配任务为例

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment