Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate ChemTEB #1585

Closed
Muennighoff opened this issue Dec 13, 2024 · 7 comments · Fixed by #1708
Closed

Integrate ChemTEB #1585

Muennighoff opened this issue Dec 13, 2024 · 7 comments · Fixed by #1708
Labels
new benchmark Issues related to adding a new benchmark

Comments

@Muennighoff
Copy link
Contributor

https://arxiv.org/abs/2412.00532v1

@Muennighoff
Copy link
Contributor Author

Since it is a fork (https://github.com/basf/chemteb) it should be relatively easy to integrate ; cc @HSILA in case you are interested in opening a PR :)

@HSILA
Copy link
Contributor

HSILA commented Dec 24, 2024

Thank you for recognizing our work on ChemTEB, and apologies for the delayed response. I can complete the tasks' metadata and open a pull request. ChemTEB currently has over 35 tasks; is it okay to integrate all of them? The performance in bitext mining tasks is around zero. I think I should exclude them so they don't affect the models' average scores. What do you think?

Also, a quick question: in PairClassification tasks, we can have a task with multiple subsets (for example, in LegalBenchPC). Is it possible to do so for classification tasks? (I want to merge some of them.)

@Muennighoff
Copy link
Contributor Author

Thanks for getting back! That would be amazing! I think all of them are fine as long as the Bitext Mining 0 performance is due to models being bad and not the task being unsolvable/random. (cc @KennethEnevoldsen in case of thoughts)

Is it possible to do so for classification tasks ?

Sounds possible to me but not sure about the details 🤔

@HSILA
Copy link
Contributor

HSILA commented Dec 24, 2024

Thank you for your encouraging words. Regarding the Bitext Mining tasks (and some PairClassification tasks), the performance around zero is likely because they involve matching chemical compound names, descriptions, or formulas with their corresponding SMILES codes. These are highly domain-specific challenges that general-purpose embedding models don’t seem to be trained to handle. While they are not entirely random, they appear unsolvable by generic models.

@Muennighoff
Copy link
Contributor Author

I see; I think these are fine to have then! Probably of high interest for people training chemistry-specific embedding models!

@isaac-chung isaac-chung added the new benchmark Issues related to adding a new benchmark label Dec 24, 2024
@KennethEnevoldsen
Copy link
Contributor

tasks that you wan to solve that current models are unable to solve seems like an idea candidate for new benchmark. You should feel more that free to add it

@isaac-chung
Copy link
Collaborator

Thanks @HSILA !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new benchmark Issues related to adding a new benchmark
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants