-
Notifications
You must be signed in to change notification settings - Fork 1
Issues: The-AI-Alliance/trust-safety-evals
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
How should we integrate the unitxt catalog into our global evaluation catalog?
#41
opened Feb 14, 2025 by
deanwampler
Create an example using Arize Phoenix
reference stack
All tools for the reference stack.
#39
opened Jan 30, 2025 by
deanwampler
Incorporate Databricks "Domain Intelligence" benchmark
evaluators
Implementations of evaluations, including benchmarks and datasets
taxonomy
The definition and review tasks for the global taxomony
#38
opened Jan 28, 2025 by
deanwampler
Investigate adding an LLM query option for the leaderboards
leaderboards
Leaderboards deployed to HF or other places
taxonomy
The definition and review tasks for the global taxomony
UX
Research for and implementation of the user experience
Create an example using Llama Guard + the reference stack
evaluators
Implementations of evaluations, including benchmarks and datasets
Examples
Tickets for building user-facing examples.
reference stack
All tools for the reference stack.
Create an example using Granite Guardian + the reference stack
evaluators
Implementations of evaluations, including benchmarks and datasets
Examples
Tickets for building user-facing examples.
reference stack
All tools for the reference stack.
Evaluate LangFair as a tool for evaluators and ideas for the taxonomy
evaluators
Implementations of evaluations, including benchmarks and datasets
help wanted
Extra attention is needed
taxonomy
The definition and review tasks for the global taxomony
Talk with MLCommons about incorporating their evaluators, benchmarks, etc.
collaborators
"Strategic" work with third-party collaborators
evaluators
Implementations of evaluations, including benchmarks and datasets
#31
opened Jan 4, 2025 by
deanwampler
Recruit design partners
leaderboards
Leaderboards deployed to HF or other places
UX
Research for and implementation of the user experience
Determine the inference mechanism to use
reference stack
All tools for the reference stack.
#28
opened Dec 12, 2024 by
deanwampler
Catalog design
help wanted
Extra attention is needed
taxonomy
The definition and review tasks for the global taxomony
Leaderboard design
leaderboards
Leaderboards deployed to HF or other places
UX
Research for and implementation of the user experience
#26
opened Dec 12, 2024 by
deanwampler
Deploy evaluation stack in our Hugging Face space
evaluators
Implementations of evaluations, including benchmarks and datasets
leaderboards
Leaderboards deployed to HF or other places
reference stack
All tools for the reference stack.
#25
opened Dec 12, 2024 by
deanwampler
POC candidate: benchmark cards
evaluators
Implementations of evaluations, including benchmarks and datasets
help wanted
Extra attention is needed
#24
opened Dec 12, 2024 by
deanwampler
Unify the MLCommons V1.0 taxonomy with IBM AI Risk Atlas and our draft taxonomy
taxonomy
The definition and review tasks for the global taxomony
Design and UX support
collaborators
"Strategic" work with third-party collaborators
leaderboards
Leaderboards deployed to HF or other places
UX
Research for and implementation of the user experience
#22
opened Dec 12, 2024 by
deanwampler
POC candidate: IBM Risk Atlas
leaderboards
Leaderboards deployed to HF or other places
reference stack
All tools for the reference stack.
taxonomy
The definition and review tasks for the global taxomony
#21
opened Dec 12, 2024 by
deanwampler
Evaluate using Seismometer
reference stack
All tools for the reference stack.
#18
opened Dec 12, 2024 by
deanwampler
Add privacy to the taxonomy
taxonomy
The definition and review tasks for the global taxomony
#17
opened Dec 12, 2024 by
deanwampler
Investigate Microsoft's Eureka platform
reference stack
All tools for the reference stack.
#16
opened Dec 10, 2024 by
deanwampler
Integrate AI Risk Atlas into the taxonomy
taxonomy
The definition and review tasks for the global taxomony
Talk with MLCommons about collaborating on the execution stack
collaborators
"Strategic" work with third-party collaborators
reference stack
All tools for the reference stack.
#13
opened Dec 9, 2024 by
deanwampler
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.