-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dont retry on connection failure #103
Dont retry on connection failure #103
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @danmcp @booxter
I really like how the errors have been broken down by their codes - it helps to further my understanding of good error handling in general
we could probably start focusing on expanding the unit testing infrastructure on the project but that should probably be in a separate PR
a356211
to
1561eee
Compare
The `python` symlink may be missing on a system; or even point to py2. We should use `python3` to be sure. (It's ok to use `python` inside a virtualenv though.) Signed-off-by: Ihar Hrachyshka <[email protected]>
Before the patch, all errors from openai were handled by retrying up to API_MAX_RETRY times and returning $ERROR$ message at the last attempt. With this patch, if all attempts result in APIConnectionError, we raise a new EvalError exception. (If at least one of the previous attempts result in a different error, then we return $ERROR$ as usual.) Also, several errors are not expected to recover with a retry (400-404, 422). This patch makes them return $ERROR$ immediately without retrying. Closes: instructlab#77 Signed-off-by: Ihar Hrachyshka <[email protected]>
Before the patch, we were calculating them on every retry attempt. The function is pure, so there is no good reason to repeat the calculation. This also simplifies the function a bit. Signed-off-by: Ihar Hrachyshka <[email protected]>
Signed-off-by: Dan McPherson <[email protected]>
1561eee
to
7fbd87e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine now. Not sure I can approve it since most of the PR is (co)authored by myself.
openai.PermissionDeniedError, # 403 | ||
openai.NotFoundError, # 404 | ||
# General catch-all | ||
openai.OpenAIError, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So AFAIU all the explicit types listed above are for documentation purposes only? Because all of them are OpenAIErrors already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I thought they were really helpful to see elaborated. Although it's a great point that we should add a comment stating that logic explicitly.
This is a continuation/replacement of #80
Resolves: #77
The approach of this separates errors into 3 buckets:
Fatal errors we retry and fail after max attempts (Ex: APIConnectionError, 404). These cases don't have much hope of fixing themselves.
Errors which we record after max attempts (Ex: RateLimitError). Failure on one try might not mean they are all going to fail.
Errors which we record after 1 occurrence (Ex: 400). Failure is request specific and retries aren't going to help.
Which allows the logic to fail fast when possible, not retry when it wouldn't help, and not catastrophically fail on isolated failures.