-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Renamed criterias in LLM-as-a-Judge metrics to criteria. #1545
base: main
Are you sure you want to change the base?
Conversation
I would use italic for CrossProviderInferenceEngine |
docs/docs/llm_as_judge.rst
Outdated
@@ -46,11 +46,11 @@ An LLM as a Judge metric consists of several essential components: | |||
1. The judge model, such as *Llama-3-8B-Instruct* or *gpt-3.5-turbo*, which evaluates the performance of other models. | |||
2. The platform responsible for executing the judge model, such as Huggingface, OpenAI API and IBM's deployment platforms such as WatsonX and RITS. | |||
A lot of these model and catalog combinations are already predefined in our catalog. The models are prefixed by metrics.llm_as_judge.direct followed by the platform and the model name. | |||
For instance, metrics.llm_as_judge.direct.rits.llama3_1_70b refers to llama3 70B model that uses RITS deployment service. | |||
For instance, *metrics.llm_as_judge.direct.rits.llama3_1_70b* refers to llama3 70B model that uses RITS deployment service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would change llama3 70B -> LLama3 70B
src/unitxt/llm_as_judge.py
Outdated
@@ -8,43 +8,16 @@ | |||
from .error_utils import UnitxtError | |||
from .inference import ( | |||
InferenceEngine, | |||
OptionSelectingByLogProbsInferenceEngine, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this imports are unused, they are here in order to let users import anything from unitxt.llmasjudge. Lets re-add them.
Since we already announced it - we need to merge it as soon as possible. Any blockers - from doing it to day? Smaller fixes can be done later. We also need to make a new release and move FM-eval to it, so FM-eval users will not start using an old name. |
* add text2sql templates Signed-off-by: Yotam-Perlitz <[email protected]> * add data managment utility for text2sql Signed-off-by: Yotam-Perlitz <[email protected]> * add basic template Signed-off-by: Yotam-Perlitz <[email protected]> * add sql execution accuracy metric Signed-off-by: Yotam-Perlitz <[email protected]> * add text2sql execution accuracy metric Signed-off-by: Yotam-Perlitz <[email protected]> * add text2sql task Signed-off-by: Yotam-Perlitz <[email protected]> * condition download in presence of a cache dir Signed-off-by: Yotam-Perlitz <[email protected]> * add init fille Signed-off-by: Yotam-Perlitz <[email protected]> * add processors Signed-off-by: Yotam-Perlitz <[email protected]> * add processors Signed-off-by: Yotam-Perlitz <[email protected]> * add basic template Signed-off-by: Yotam-Perlitz <[email protected]> * change id to int Signed-off-by: Yotam-Perlitz <[email protected]> * change notations in templates Signed-off-by: Yotam-Perlitz <[email protected]> * push to catalog Signed-off-by: Yotam-Perlitz <[email protected]> * add evidence, remove SL Signed-off-by: Yotam-Perlitz <[email protected]> * remove unued function, fix Signed-off-by: Yotam-Perlitz <[email protected]> * fix imports from unitxt.text2sql Signed-off-by: Yotam-Perlitz <[email protected]> * push to catalog Signed-off-by: Yotam-Perlitz <[email protected]> * fix cache location Signed-off-by: Yotam-Perlitz <[email protected]> * add example Signed-off-by: Yotam-Perlitz <[email protected]> * fix imports Signed-off-by: Yotam-Perlitz <[email protected]> * add func_timeout to test reqs Signed-off-by: Yotam-Perlitz <[email protected]> * fix typing Signed-off-by: Yotam-Perlitz <[email protected]> * change template name Signed-off-by: Yotam-Perlitz <[email protected]> * push to catalog Signed-off-by: Yotam-Perlitz <[email protected]> * add req Signed-off-by: Yotam-Perlitz <[email protected]> * add local model option Signed-off-by: Yotam-Perlitz <[email protected]> * fix databases download Signed-off-by: Yotam-Perlitz <[email protected]> * fix databases download Signed-off-by: Yotam-Perlitz <[email protected]> * add loader limit ot make example faster Signed-off-by: Yotam-Perlitz <[email protected]> * fix cache paths, avoid re-download Signed-off-by: Yotam-Perlitz <[email protected]> * add type schema Signed-off-by: Yotam-Perlitz <[email protected]> * remove inports from inits Signed-off-by: Yotam-Perlitz <[email protected]> * add text2sql to inits Signed-off-by: Yotam-Perlitz <[email protected]> * update card to use serializers Signed-off-by: Yotam-Perlitz <[email protected]> * add schema serializer Signed-off-by: Yotam-Perlitz <[email protected]> * add text2sql serializer to default template Signed-off-by: Yotam-Perlitz <[email protected]> * add schema to task Signed-off-by: Yotam-Perlitz <[email protected]> * adjust templates to using serializer Signed-off-by: Yotam-Perlitz <[email protected]> * adjust templates to using serializer Signed-off-by: Yotam-Perlitz <[email protected]> * fix processor Signed-off-by: Yotam-Perlitz <[email protected]> * remove target prefix from template Signed-off-by: Yotam-Perlitz <[email protected]> * add shuffle to bird Signed-off-by: Yotam-Perlitz <[email protected]> * add shuffle to bird Signed-off-by: Yotam-Perlitz <[email protected]> * edit template Signed-off-by: Yotam-Perlitz <[email protected]> * remove comment from init Signed-off-by: Yotam-Perlitz <[email protected]> * clear processors code Signed-off-by: Yotam-Perlitz <[email protected]> * add option with ticks Signed-off-by: Yotam-Perlitz <[email protected]> * add anls metric Signed-off-by: Yotam-Perlitz <[email protected]> * add template Signed-off-by: Yotam-Perlitz <[email protected]> * drop comment Signed-off-by: Yotam-Perlitz <[email protected]> * remove recursion limit Signed-off-by: Yotam-Perlitz <[email protected]> * add loader_limit to example Signed-off-by: Yotam-Perlitz <[email protected]> * fix recursion error Signed-off-by: Yotam-Perlitz <[email protected]> * move import to withing metric Signed-off-by: Yotam-Perlitz <[email protected]> * remove catalog files wo prepare Signed-off-by: Yotam-Perlitz <[email protected]> * fix typing Signed-off-by: Yotam-Perlitz <[email protected]> * change template im example Signed-off-by: Yotam-Perlitz <[email protected]> * moving text2sql implementaion to the main src dir Signed-off-by: Yotam-Perlitz <[email protected]> * fix imports Signed-off-by: Yotam-Perlitz <[email protected]> * fix imports Signed-off-by: Yotam-Perlitz <[email protected]> * fix imports Signed-off-by: Yotam-Perlitz <[email protected]> * fix imports Signed-off-by: Yotam-Perlitz <[email protected]> * import data_utils Signed-off-by: Yotam-Perlitz <[email protected]> * fix formatting Signed-off-by: Yotam-Perlitz <[email protected]> * refactor names Signed-off-by: Yotam-Perlitz <[email protected]> * add processors tests Signed-off-by: Yotam-Perlitz <[email protected]> * add more tests Signed-off-by: Yotam-Perlitz <[email protected]> * add tests Signed-off-by: Yotam-Perlitz <[email protected]> * refactor: allow more data sources Signed-off-by: Yotam-Perlitz <[email protected]> * allow db source input Signed-off-by: Yotam-Perlitz <[email protected]> * organize imports Signed-off-by: Yotam-Perlitz <[email protected]> * update example Signed-off-by: Yotam-Perlitz <[email protected]> * add db_type to task Signed-off-by: Yotam-Perlitz <[email protected]> * format Signed-off-by: Yotam-Perlitz <[email protected]> * add db_type to task Signed-off-by: Yotam-Perlitz <[email protected]> * add local db definition ability Signed-off-by: Yotam-Perlitz <[email protected]> * add EE tests Signed-off-by: Yotam-Perlitz <[email protected]> * add tests Signed-off-by: Yotam-Perlitz <[email protected]> * rename file Signed-off-by: Yotam-Perlitz <[email protected]> * rename file Signed-off-by: Yotam-Perlitz <[email protected]> * update sql metric Signed-off-by: Yotam-Perlitz <[email protected]> * rename file Signed-off-by: Yotam-Perlitz <[email protected]> * refactor types, serializers and metric Signed-off-by: Yotam-Perlitz <[email protected]> --------- Signed-off-by: Yotam-Perlitz <[email protected]>
* Add deduplicate operator Signed-off-by: elronbandel <[email protected]> * Deduplicate MMLU Signed-off-by: elronbandel <[email protected]> * Update Deduplicate example in documentation for clarity Signed-off-by: elronbandel <[email protected]> * Deduplicate social iqa Signed-off-by: elronbandel <[email protected]> --------- Signed-off-by: elronbandel <[email protected]>
Signed-off-by: elronbandel <[email protected]>
* Add mtrag benchmark Signed-off-by: elronbandel <[email protected]> * Add multi_type_serializer for references and prediction fields in various JSON metrics Signed-off-by: elronbandel <[email protected]> * Remove unused TempOperator class and delete obsolete multi_turn.json task file Signed-off-by: elronbandel <[email protected]> --------- Signed-off-by: elronbandel <[email protected]>
Signed-off-by: elronbandel <[email protected]>
…ility Signed-off-by: elronbandel <[email protected]>
Signed-off-by: Martín Santillán Cooper <[email protected]>
Signed-off-by: Martín Santillán Cooper <[email protected]>
9a968fd
to
8efc10b
Compare
No description provided.