Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor model-specific configs and move data curation scripts #60

Merged
merged 74 commits into from
Feb 5, 2025

Conversation

SumanthRH
Copy link
Collaborator

@SumanthRH SumanthRH commented Feb 2, 2025

What does this PR do?

Another refactor PR on top of #23 now focused on model-specific configurations and data generation.

  • Model-specific system prompts, user templates etc are best left to be in the a YAML file.
  • TaskHandler should be model agnostic, since we want to have a consistent evaluation logic for all tasks
  • Data curation scripts for different Sky-T1 models should live outside the skythought_evals package. These are mostly scripts focused on a particular data curation task like filtering, rewriting etc. My proposal is to place common scripts in scripts/ . A guide for obtaining the final training data + training commands for different Sky-T1 models should be placed in recipes/ .

scripts can be organized better, but for now I've placed all the data curation scripts in this folder.

TODO:

  • Move Sky-T1 specific instructions to recipes/
  • Improve READMEs -skythought_evals/README.md and README.md
  • Verify with E2E tests for Sky-T1

Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
x
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
x
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
x
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
x
Signed-off-by: SumanthRH <[email protected]>
x
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
x
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
x
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
x
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
x
Signed-off-by: SumanthRH <[email protected]>
x
Signed-off-by: SumanthRH <[email protected]>
x
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
@lynnliu030 lynnliu030 self-requested a review February 2, 2025 02:25
x
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
x
Signed-off-by: SumanthRH <[email protected]>
x
Signed-off-by: SumanthRH <[email protected]>
@SumanthRH SumanthRH marked this pull request as ready for review February 4, 2025 01:24
x
Signed-off-by: SumanthRH <[email protected]>
@SumanthRH SumanthRH requested a review from caoshiyi February 4, 2025 01:44
@SumanthRH
Copy link
Collaborator Author

This PR is now ready for review.

Since i've moved the commands for data curation, it would be good for a dummy test run on small data for these. I don't have the intermediate data for some of the commands (like convert_data.py etc) so it would be good to test this out @caoshiyi @lynnliu030 @tyler-griggs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this look good? @tyler-griggs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic. Thanks!

@SumanthRH SumanthRH requested a review from caoshiyi February 5, 2025 07:43
x
Signed-off-by: SumanthRH <[email protected]>
Copy link
Member

@lynnliu030 lynnliu030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work!

@SumanthRH SumanthRH merged commit cb45c81 into NovaSky-AI:main Feb 5, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants