-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: add exit code & tox fix #217
Conversation
This pull request has merge conflicts that must be resolved before it can be |
@RobotSail Is this still useful after #229? |
@JamesKunstle They are similar but this is specifically when an error occurs within the child process. The other PR seems to only be covering the case when a |
2951e46
to
ae60ce0
Compare
9d6a3e2
to
6c2d97a
Compare
6c2d97a
to
832afc7
Compare
832afc7
to
e034b8c
Compare
e034b8c
to
13a1373
Compare
Currently, the training library does not exit when an error is encountered within the training loop (invoked through torchrun). This commit updates that functionality so we correctly return an exit code of 1 on child failure. Additionally, this commit also adds the `make fix` command which automatically fixes all trivial issues picked up on by ruff Signed-off-by: Oleg S <[email protected]>
13a1373
to
9c899dc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: There might be some changes necessary in the CLI after this change to handle the exception and exit with an appropriate message.
@Mergifyio backport release-v0.5 |
✅ Backports have been created
|
chore: add exit code & tox fix (backport #217)
Currently, the training library does not exit when an error is encountered
within the training loop (invoked through torchrun). This commit updates
that functionality so we correctly return an exit code of 1 on child failure.
Additionally, this commit also adds the
make fix
command whichautomatically fixes all trivial issues picked up on by ruff
Signed-off-by: Oleg S [email protected]
Resolves #216