feat: custom evaluator and metric name to support llm evaluation #433 #459

fecet · 2023-05-15T14:56:41Z

I have taken the initiative to develop a project aimed at building a transparent, democratic, and reproducible framework for LLM evaluation, which can be found here https://huggingface.co/spaces/SUSTech/llm-evaluate/. The goal is to enable anyone to utilize datasets and metrics hosted on Hugging Face for evaluating their own LLM models and sharing their results, datasets, metrics, and more.

Currently, the implementation of the evaluate feature does not support a custom evaluator. To overcome this limitation, I have integrated a custom subtask and evaluator within the existing code of the evaluation module, https://huggingface.co/spaces/SUSTech/llm-evaluate/blob/main/utils.py.

However, I encountered a challenge when attempting to utilize the metric config name to define the task. As evident in the evaluator's source code

evaluate/src/evaluate/evaluator/base.py

Line 480 in 0ca575d

def prepare_metric(self, metric: Union[str, EvaluationModule]):

, the evaluator.compute function only accepts the metric name as an argument. Consequently, if I wish to pass the metric config name, it seems that I have no option but to override the entire evaluator.compute function.

This approach would lead to unnecessary code complexity and difficulty in maintaining my codebase as the evaluation module grows. So I create this PR for enhancing evaluate and eliminate this inconvenience.

It would be a great pleasure for me to provide any assistance that could contribute to the enhancement of this repository and ensure cleaner code implementation.

I appreciate your time and consideration of my request. Please let me know if there is any additional information or clarification I can provide. I am eagerly looking forward to your valuable feedback and guidance.

…ingface#433

feat: custom evaluator and metric name to support llm evaluation hugg…

c8e8e62

…ingface#433

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: custom evaluator and metric name to support llm evaluation #433 #459

feat: custom evaluator and metric name to support llm evaluation #433 #459

fecet commented May 15, 2023 •

edited

Loading

feat: custom evaluator and metric name to support llm evaluation #433 #459

Are you sure you want to change the base?

feat: custom evaluator and metric name to support llm evaluation #433 #459

Conversation

fecet commented May 15, 2023 • edited Loading

fecet commented May 15, 2023 •

edited

Loading