Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add exit code & tox fix #217

Merged
merged 1 commit into from
Oct 17, 2024

Conversation

RobotSail
Copy link
Member

Currently, the training library does not exit when an error is encountered
within the training loop (invoked through torchrun). This commit updates
that functionality so we correctly return an exit code of 1 on child failure.

Additionally, this commit also adds the make fix command which
automatically fixes all trivial issues picked up on by ruff

Signed-off-by: Oleg S [email protected]

Resolves #216

@mergify mergify bot added CI/CD Affects CI/CD configuration testing Relates to testing ci-failure labels Sep 19, 2024
Copy link
Contributor

mergify bot commented Sep 26, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @RobotSail please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@JamesKunstle
Copy link
Contributor

@RobotSail Is this still useful after #229?

@RobotSail
Copy link
Member Author

@JamesKunstle They are similar but this is specifically when an error occurs within the child process. The other PR seems to only be covering the case when a KeyboardInterrupt is issued while streaming the subprocess call

@mergify mergify bot added the one-approval label Oct 16, 2024
@mergify mergify bot removed the needs-rebase label Oct 16, 2024
@RobotSail RobotSail force-pushed the propogate-failure branch 2 times, most recently from 9d6a3e2 to 6c2d97a Compare October 16, 2024 18:13
@mergify mergify bot added ci-failure and removed ci-failure labels Oct 16, 2024
src/instructlab/training/main_ds.py Outdated Show resolved Hide resolved
src/instructlab/training/main_ds.py Show resolved Hide resolved
Currently, the training library does not exit when an error is encountered
within the training loop (invoked through torchrun). This commit updates
that functionality so we correctly return an exit code of 1 on child failure.

Additionally, this commit also adds the `make fix` command which
automatically fixes all trivial issues picked up on by ruff

Signed-off-by: Oleg S <[email protected]>
@mergify mergify bot removed the ci-failure label Oct 16, 2024
Copy link
Member

@danmcp danmcp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: There might be some changes necessary in the CLI after this change to handle the exception and exit with an appropriate message.

@mergify mergify bot removed the one-approval label Oct 17, 2024
@mergify mergify bot merged commit c0ee1aa into instructlab:main Oct 17, 2024
13 checks passed
@Maxusmusti
Copy link
Contributor

@Mergifyio backport release-v0.5

Copy link
Contributor

mergify bot commented Oct 18, 2024

backport release-v0.5

✅ Backports have been created

mergify bot added a commit that referenced this pull request Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/CD Affects CI/CD configuration testing Relates to testing
Projects
None yet
5 participants