feat: custom evaluator and metric name to support llm evaluation #433 #459
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I have taken the initiative to develop a project aimed at building a transparent, democratic, and reproducible framework for LLM evaluation, which can be found here https://huggingface.co/spaces/SUSTech/llm-evaluate/. The goal is to enable anyone to utilize datasets and metrics hosted on Hugging Face for evaluating their own LLM models and sharing their results, datasets, metrics, and more.
Currently, the implementation of the
evaluate
feature does not support a custom evaluator. To overcome this limitation, I have integrated a custom subtask and evaluator within the existing code of the evaluation module, https://huggingface.co/spaces/SUSTech/llm-evaluate/blob/main/utils.py.However, I encountered a challenge when attempting to utilize the metric config name to define the task. As evident in the evaluator's source code
evaluate/src/evaluate/evaluator/base.py
Line 480 in 0ca575d
evaluator.compute
function.This approach would lead to unnecessary code complexity and difficulty in maintaining my codebase as the evaluation module grows. So I create this PR for enhancing
evaluate
and eliminate this inconvenience.It would be a great pleasure for me to provide any assistance that could contribute to the enhancement of this repository and ensure cleaner code implementation.
I appreciate your time and consideration of my request. Please let me know if there is any additional information or clarification I can provide. I am eagerly looking forward to your valuable feedback and guidance.