Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect error strings reported for end-user application failures #7776

Closed
sfayer opened this issue Sep 6, 2024 · 3 comments · Fixed by #7785
Closed

Incorrect error strings reported for end-user application failures #7776

sfayer opened this issue Sep 6, 2024 · 3 comments · Fixed by #7785

Comments

@sfayer
Copy link
Member

sfayer commented Sep 6, 2024

Hi,

We've had a long running annoyance where the ApplicationStatus reported back for failed user jobs has the wrong error message. For example, if a user job script runs an invalid command, bash will exit with error code 127; the error displayed in the ApplicationStatus field will be "Key has expired ( 127 : submit Exited With Status 127)".
A similar thing happens if the user payload returns error code 1 for any reason "Operation not permitted ( 1 : submit Exited With Status 1)".
This causes user confusion "what key has expired? What am I not permitted to do?" but in both of these cases these are the wrong error messages; the bash exit code has been processed by strerror as if it's an errno value, but it isn't.

I think this is happening here, the application exit code is included in a RuntimeException:

raise RuntimeError(f"'{os.path.basename(self.executable).split('_')[0]}' Exited With Status {status}", status)

and then this gets set as the error number in D_ERROR which runs strerror on it:
return S_ERROR(rte.args[1], rte.args[0]) # rte[1] should be an error code

Would it be possible to somehow prevent these exit codes getting processed into errno style error messages?

Regards,
Simon

@fstagni
Copy link
Contributor

fstagni commented Sep 12, 2024

@arrabito do you have the same issue?

@chrisburr
Copy link
Member

This was added for #3394 we probably don't need this option any more as we don't use in LHCb anymore.

@arrabito
Copy link
Contributor

Yes we have the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants