-
-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to Evals #851
base: main
Are you sure you want to change the base?
Conversation
@bborn How useful is the regex evaluator in your opinion? Do you have any thoughts evals that calculate vector distance? |
@andreibondarev I think regex or other string comparison is pretty important. You might want to ensure your model/agent is returning a number, a URL, an email, etc. Or you might want to ensure that the answer contains the expected output string (this doesn't exist in Lanchain.rb yet), something like:
Vector (or levenshtein, etc.) distance seems useful too. Not so much as an absolute score but as something you could look at over time (if our agent was getting a vector score of .75 for the last three months, and then we changed the prompt and now it's getting .45, we'd be concerned). I think the evaluators kind of break down into LLM Graded, LLM Labeled, and Code Graded): LLM Graded: ask another LLM to score the dataset item based on some criteria |
Another thought: maybe you should be able to add an Eval to your Agent or llm call like this:
By default this would store eval results in a CSV (could be anything, sqlite, whatever) in the same location as the dataset. Another idea would be the ability to log the completion results to the dataset before evaluating them (e.g. if you don't already have a dataset):
|
Just exploring some ideas: