-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry batch delete blob on 503 #1277
Labels
api: storage
Issues related to the googleapis/python-storage API.
priority: p3
Desirable enhancement or fix. May not be included in next release.
type: feature request
‘Nice-to-have’ improvement, new feature or different behavior or design.
Comments
HI, someone find a solution for this? |
sadovnychyi
added a commit
to sadovnychyi/beam
that referenced
this issue
Jan 8, 2025
A transient error might occur when writing a lot of shards to GCS, and right now the GCS IO does not have any retry logic in place: https://github.com/apache/beam/blob/a06454a2/sdks/python/apache_beam/io/gcp/gcsio.py#L269 It means that in such cases the entire bundle of elements fails, and then Beam itself will attempt to retry the entire bundle, and will fail the job if it exceeds the number of retries. This change adds new logic to retry only failed requests, and uses the typical exponential backoff strategy. Note that this change accesses a private method (`_predicate`) of the retry object, which we could avoid by basically copying the logic over here. But existing code already accesses `_responses` property so maybe it's not a big deal. https://github.com/apache/beam/blob/b4c3a4ff/sdks/python/apache_beam/io/gcp/gcsio.py#L297 Existing (unresolved) issue in the GCS client library: googleapis/python-storage#1277
3 tasks
sadovnychyi
added a commit
to sadovnychyi/beam
that referenced
this issue
Jan 8, 2025
A transient error might occur when writing a lot of shards to GCS, and right now the GCS IO does not have any retry logic in place: https://github.com/apache/beam/blob/a06454a2/sdks/python/apache_beam/io/gcp/gcsio.py#L269 It means that in such cases the entire bundle of elements fails, and then Beam itself will attempt to retry the entire bundle, and will fail the job if it exceeds the number of retries. This change adds new logic to retry only failed requests, and uses the typical exponential backoff strategy. Note that this change accesses a private method (`_predicate`) of the retry object, which we could avoid by basically copying the logic over here. But existing code already accesses `_responses` property so maybe it's not a big deal. https://github.com/apache/beam/blob/b4c3a4ff/sdks/python/apache_beam/io/gcp/gcsio.py#L297 Existing (unresolved) issue in the GCS client library: googleapis/python-storage#1277
sadovnychyi
added a commit
to sadovnychyi/beam
that referenced
this issue
Jan 8, 2025
A transient error might occur when writing a lot of shards to GCS, and right now the GCS IO does not have any retry logic in place: https://github.com/apache/beam/blob/a06454a2/sdks/python/apache_beam/io/gcp/gcsio.py#L269 It means that in such cases the entire bundle of elements fails, and then Beam itself will attempt to retry the entire bundle, and will fail the job if it exceeds the number of retries. This change adds new logic to retry only failed requests, and uses the typical exponential backoff strategy. Note that this change accesses a private method (`_predicate`) of the retry object, which we could avoid by basically copying the logic over here. But existing code already accesses `_responses` property so maybe it's not a big deal. https://github.com/apache/beam/blob/b4c3a4ff/sdks/python/apache_beam/io/gcp/gcsio.py#L297 Existing (unresolved) issue in the GCS client library: googleapis/python-storage#1277
Abacn
pushed a commit
to apache/beam
that referenced
this issue
Jan 10, 2025
* Add retry logic to each batch method of the GCS IO A transient error might occur when writing a lot of shards to GCS, and right now the GCS IO does not have any retry logic in place: https://github.com/apache/beam/blob/a06454a2/sdks/python/apache_beam/io/gcp/gcsio.py#L269 It means that in such cases the entire bundle of elements fails, and then Beam itself will attempt to retry the entire bundle, and will fail the job if it exceeds the number of retries. This change adds new logic to retry only failed requests, and uses the typical exponential backoff strategy. Note that this change accesses a private method (`_predicate`) of the retry object, which we could avoid by basically copying the logic over here. But existing code already accesses `_responses` property so maybe it's not a big deal. https://github.com/apache/beam/blob/b4c3a4ff/sdks/python/apache_beam/io/gcp/gcsio.py#L297 Existing (unresolved) issue in the GCS client library: googleapis/python-storage#1277 * Catch correct exception type in `_batch_with_retry` The `RetryError` would be always raised since the retry decorator would catch all HTTP-related exceptions. * Update chanelog with GCSIO retry logic fix
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
api: storage
Issues related to the googleapis/python-storage API.
priority: p3
Desirable enhancement or fix. May not be included in next release.
type: feature request
‘Nice-to-have’ improvement, new feature or different behavior or design.
Is your feature request related to a problem? Please describe.
When deleting a lot of blobs using the batch API, it sometimes raises a
ServiceUnavailable: 503 BATCH contentid://None: We encountered an internal error. Please try again.
. This is a bit undesired, that it raises in the middle of a big deletion job.Describe the solution you'd like
I tried settings the retry parameter at the client level
client.get_bucket(bucket_name, retry=retry, timeout=600)
or a the blob levelblob.delete(retry=retry, timeout=600)
, even forcing theif_generation_match=blob.generation
. No retry seem to be done. The class does not seem to use any retry here:python-storage/google/cloud/storage/batch.py
Line 309 in c52e882
Either the client can support it, or at the very least the batch object should give access to the blobs (subtasks) that couldn't be deleted so that we can retry manually.
A manual retry of the full batch (for loop) does not work as some of the blobs from the batch got deleted in the first attempt, raising a 404 on the second attempt.
A clear and concise description of what you want to happen.
Retry or give the user the ability to retry only the one that fails
The text was updated successfully, but these errors were encountered: