Add curator to handle inference for the model being evaluated #51

RyanMarten · 2025-01-16T21:09:58Z

No description provided.

…eval

…repeated

…urator model (doesn't work as is)

RyanMarten · 2025-01-16T23:54:45Z

Ok we are going to fix two things in curator to simplify this (can do another PR later when these fixes are released)

Allow for passing in list[messages] directly in llm() instead of requiring to make a dataset

Allow lists of messages as simple input bespokelabsai/curator#371

Fixing the rate limit issue with anthropic models so we don't have to have manual if statements setting rate limits

Support separate rate limits for input and output tokens bespokelabsai/curator#239

RyanMarten · 2025-01-21T07:13:36Z

Testing with

  python -m eval.eval \
        --model curator  \
        --tasks alpaca_eval \
        --model_name "gemini/gemini-1.5-flash" \
        --annotator_model "gpt-4o-mini-2024-07-18" \
        --apply_chat_template False \
        --model_args 'tokenized_requests=False' \
        --output_path logs

RyanMarten · 2025-01-21T07:19:43Z

Testing with

 python -m eval.eval \
        --model curator  \
        --tasks alpaca_eval \
        --model_name "claude-3-5-haiku-20241022" \
        --annotator_model "gpt-4o-mini-2024-07-18" \
        --apply_chat_template False \
        --model_args 'tokenized_requests=False' \
        --debug \
        --output_path logs

Working!

RyanMarten · 2025-01-21T08:54:20Z

It would be better to pass backend_params (e.g. {"max_requests_per_minute": 2_000, "max_tokens_per_minute": 4_000_000}) via --model_args but couldn't really figure it out how to access these args in the class

@jmercat have you done this before? tried adding a

    @classmethod
    def create_from_arg_string(

https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/models/ibm_watsonx_ai.py#L72-L73

But ran into bugs, so just hardcoded the rate limits for gemini in the class itself

… jean/curator_model

…r claude and gemini, remove ability to pass arbitrary backend_params as it mostly crashes, uses max_retries and timeout parameters

jmercat and others added 8 commits January 9, 2025 15:07

add beginning of the code to the question if it is not there in human…

6e774fc

…eval

only look for the function_name definition to check if the output is …

6f5e718

…repeated

remove mistakenly added livebench gitignore

f474c35

wildbench simplify chat input to be compatible with all models, add c…

c7d2dc6

…urator model (doesn't work as is)

fixed input dataset creation

32b8a4b

.gitignore

bfa48e0

Ryan's fixes

4a4ebb5

merged Ryan's fixes

8f3c5cb

RyanMarten mentioned this pull request Jan 17, 2025

Allow user to specify which finish_reason's to retry / fail on bespokelabsai/curator#375

Closed

RyanMarten self-assigned this Jan 18, 2025

RyanMarten added 2 commits January 20, 2025 23:06

Merge branch 'main' into jean/curator_model

73a970c

finish length and input interface fixes

d9ca64b

RyanMarten added 2 commits January 21, 2025 00:10

test functions now

3fec4c4

undo

4f04018

jmercat and others added 9 commits January 21, 2025 15:12

some clean-up

2ec5a57

merged with upstream

6ddfdcb

bump evalchemy

edd7667

Merge remote-tracking branch 'remotes/origin/jean/curator_model' into…

482cbdc

… jean/curator_model

In CuratorAPIModel: allow to overwirte the default requests limits fo…

bda927f

…r claude and gemini, remove ability to pass arbitrary backend_params as it mostly crashes, uses max_retries and timeout parameters

don't exclude args

1849ff8

Handle change of gen params needed for multi-turn

78fbd39

[wip] MTBench with curator is failing

f83bb6b

remove breakpoint, MTBench still failing

e31b2ce

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add curator to handle inference for the model being evaluated #51

Add curator to handle inference for the model being evaluated #51

RyanMarten commented Jan 16, 2025

RyanMarten commented Jan 16, 2025

RyanMarten commented Jan 21, 2025

RyanMarten commented Jan 21, 2025 •

edited

Loading

RyanMarten commented Jan 21, 2025

Add curator to handle inference for the model being evaluated #51

Are you sure you want to change the base?

Add curator to handle inference for the model being evaluated #51

Conversation

RyanMarten commented Jan 16, 2025

RyanMarten commented Jan 16, 2025

RyanMarten commented Jan 21, 2025

RyanMarten commented Jan 21, 2025 • edited Loading

RyanMarten commented Jan 21, 2025

RyanMarten commented Jan 21, 2025 •

edited

Loading