Skip to content

Issues: The-AI-Alliance/trust-safety-evals

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

Investigate leveraging DataPerf
#40 opened Feb 5, 2025 by deanwampler
Create an example using Arize Phoenix reference stack All tools for the reference stack.
#39 opened Jan 30, 2025 by deanwampler
Incorporate Databricks "Domain Intelligence" benchmark evaluators Implementations of evaluations, including benchmarks and datasets taxonomy The definition and review tasks for the global taxomony
#38 opened Jan 28, 2025 by deanwampler
Investigate adding an LLM query option for the leaderboards leaderboards Leaderboards deployed to HF or other places taxonomy The definition and review tasks for the global taxomony UX Research for and implementation of the user experience
#36 opened Jan 16, 2025 by deanwampler 2025-02-28
Create an example using Llama Guard + the reference stack evaluators Implementations of evaluations, including benchmarks and datasets Examples Tickets for building user-facing examples. reference stack All tools for the reference stack.
#34 opened Jan 15, 2025 by deanwampler 2025-01-31
Create an example using Granite Guardian + the reference stack evaluators Implementations of evaluations, including benchmarks and datasets Examples Tickets for building user-facing examples. reference stack All tools for the reference stack.
#33 opened Jan 15, 2025 by deanwampler 2025-01-31
Evaluate LangFair as a tool for evaluators and ideas for the taxonomy evaluators Implementations of evaluations, including benchmarks and datasets help wanted Extra attention is needed taxonomy The definition and review tasks for the global taxomony
#32 opened Jan 13, 2025 by deanwampler 2025-01-31
Talk with MLCommons about incorporating their evaluators, benchmarks, etc. collaborators "Strategic" work with third-party collaborators evaluators Implementations of evaluations, including benchmarks and datasets
#31 opened Jan 4, 2025 by deanwampler
Recruit design partners leaderboards Leaderboards deployed to HF or other places UX Research for and implementation of the user experience
#30 opened Dec 12, 2024 by deanwampler 2025-01-31
Organize project repo administration Project management, etc.
#29 opened Dec 12, 2024 by deanwampler 2025-01-31
Determine the inference mechanism to use reference stack All tools for the reference stack.
#28 opened Dec 12, 2024 by deanwampler
Catalog design help wanted Extra attention is needed taxonomy The definition and review tasks for the global taxomony
#27 opened Dec 12, 2024 by deanwampler 2025-01-31
Leaderboard design leaderboards Leaderboards deployed to HF or other places UX Research for and implementation of the user experience
#26 opened Dec 12, 2024 by deanwampler
Deploy evaluation stack in our Hugging Face space evaluators Implementations of evaluations, including benchmarks and datasets leaderboards Leaderboards deployed to HF or other places reference stack All tools for the reference stack.
#25 opened Dec 12, 2024 by deanwampler
POC candidate: benchmark cards evaluators Implementations of evaluations, including benchmarks and datasets help wanted Extra attention is needed
#24 opened Dec 12, 2024 by deanwampler
Unify the MLCommons V1.0 taxonomy with IBM AI Risk Atlas and our draft taxonomy taxonomy The definition and review tasks for the global taxomony
#23 opened Dec 12, 2024 by deanwampler 2025-02-28
Design and UX support collaborators "Strategic" work with third-party collaborators leaderboards Leaderboards deployed to HF or other places UX Research for and implementation of the user experience
#22 opened Dec 12, 2024 by deanwampler
POC candidate: IBM Risk Atlas leaderboards Leaderboards deployed to HF or other places reference stack All tools for the reference stack. taxonomy The definition and review tasks for the global taxomony
#21 opened Dec 12, 2024 by deanwampler
Evaluate using Seismometer reference stack All tools for the reference stack.
#18 opened Dec 12, 2024 by deanwampler
Add privacy to the taxonomy taxonomy The definition and review tasks for the global taxomony
#17 opened Dec 12, 2024 by deanwampler
Investigate Microsoft's Eureka platform reference stack All tools for the reference stack.
#16 opened Dec 10, 2024 by deanwampler
Integrate AI Risk Atlas into the taxonomy taxonomy The definition and review tasks for the global taxomony
#15 opened Dec 9, 2024 by deanwampler 2025-01-31
Talk with MLCommons about collaborating on the execution stack collaborators "Strategic" work with third-party collaborators reference stack All tools for the reference stack.
#13 opened Dec 9, 2024 by deanwampler
ProTip! Add no:assignee to see everything that’s not assigned.