-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry upload requests under certain conditions #210
base: master
Are you sure you want to change the base?
Conversation
1c5610f
to
4c76e98
Compare
4c76e98
to
bcec314
Compare
I've tested this manually, so I'm marking it ready for review. One thing I discovered is that shutting down my minio worked well to simulate an axios network error, but Chrome devtools switching to "offline" mode triggers a different error type that I'm not sure we should attempt to catch. There is no test framework in this repository yet, so I haven't added any unit tests. I think it would be good to do so, but I didn't want to blow up the scope of the PR. |
const axiosErr = (error as AxiosError); | ||
return axiosErr.isAxiosError && ( | ||
!axiosErr.response | ||
|| [429, 500, 502, 503, 504].includes(axiosErr.response.status) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not all 5xx
errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danlamanna suggested this list, and I added one extra code to his suggestion. I think I prefer a whitelist of specific codes, such that any "unexpected" situation will break out of our retry loop, but I'm happy to expand that list if there are other codes you want to add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
429
makes sense, but I think we should retry the entire range of 5xx
errors unless we know it's inappropriate. @danlamanna what do you think?
); | ||
} | ||
|
||
async function retry<T>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about using one of these:
I prefer https://www.npmjs.com/package/retry-axios , since it uses proper Axios interceptors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I'd prefer not to have to rewrite this, but if you think using one of those libraries is going to be a better experience, go ahead and make the decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using https://www.npmjs.com/package/retry-axios has the potential to remove most of the code (and maintenance burden) here, I think it would be worth trying. @zachmullen If you don't have time, I'd be happy to try adding it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I'm using axios-retry in NLI, and it works fine. retry-axios does look a little more refined though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brianhelba that would be great if you want to take a crack at it. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% sure, but from the README it looks like axios-retry
doesn't have a callback when retry is happening, but retry-axios
does.
@@ -140,15 +186,18 @@ export default class S3FileFieldClient { | |||
protected async completeUpload( | |||
multipartInfo: MultipartInfo, parts: UploadedPart[], | |||
): Promise<void> { | |||
const response = await this.api.post('upload-complete/', { | |||
const response = await retry<AxiosResponse>(() => this.api.post('upload-complete/', { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since POST is not necessarily idempotent, do we know what happens if a client re-calls the endpoint mistakenly (perhaps due to the network dropping only the response)?
If this endpoint is actually idempotent, maybe we should change it to a PUT request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think either the endpoint should be idempotent, or if not, it should return 400 if a duplicate happens, which would break us out of the loop.
@@ -168,8 +219,10 @@ export default class S3FileFieldClient { | |||
* @param multipartInfo Signed information returned from /upload-complete/. | |||
*/ | |||
protected async finalize(multipartInfo: MultipartInfo): Promise<string> { | |||
const response = await this.api.post('finalize/', { | |||
const response = await retry<AxiosResponse>(() => this.api.post('finalize/', { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, do we know what happens when this is called repeatedly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully either 200 OK or 400 bad request. I don't know for sure.
// Send the CompleteMultipartUpload operation to S3 | ||
await axios.post(completeUrl, body, { | ||
await retry<AxiosResponse>(() => axios.post(completeUrl, body, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that if this is called repeatedly, AWS will either:
- idempotently succeed again, which is fine
- return an error in the body, which is unrecoverable (and the retries should stop)
I don't think we need to do anything more here, but we should be sure to handle this with #209.
Just posting this as a draft for preliminary review. I still need to test it manually. It's also a reasonable target for automated testing.