[CLI] Retry if upload fails #926

alexandrnikitin · 2023-02-24T18:01:01Z

Hi, Time to time we get 503 errors while uploading the data. The log looks like this:

...
[2023-02-24T17:38:21.359Z] ['verbose'] tag
[2023-02-24T17:38:21.359Z] ['verbose'] flags
[2023-02-24T17:38:21.359Z] ['verbose'] parent
[2023-02-24T17:38:21.360Z] ['info'] Pinging Codecov: https://codecov.io/upload/v4?package=github-action-2.1.0-uploader-0.3.5&token=*******....
[2023-02-24T17:38:21.360Z] ['verbose'] Passed token was 36 characters long
[2023-02-24T17:38:21.360Z] ['verbose'] https://codecov.io/upload/v4?package=github-action-2.1.0-uploader-0.3.5&...
        Content-Type: 'text/plain'
        Content-Encoding: 'gzip'
        X-Reduced-Redundancy: 'false'
[2023-02-24T17:38:23.332Z] ['error'] There was an error running the uploader: Error uploading to [https://codecov.io:](https://codecov.io/) Error: There was an error fetching the storage URL during POST: 503 - upstream connect error or disconnect/reset before headers. reset reason: connection failure
[2023-02-24T17:38:23.332Z] ['verbose'] The error stack is: Error: Error uploading to [https://codecov.io:](https://codecov.io/) Error: There was an error fetching the storage URL during POST: 503 - upstream connect error or disconnect/reset before headers. reset reason: connection failure
    at main (/snapshot/repo/dist/src/index.js)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
[2023-02-24T17:38:23.332Z] ['verbose'] End of uploader: 3001 milliseconds

It would be great to have a retry mechanism with some defined timeout.

The text was updated successfully, but these errors were encountered:

LucasXu0 · 2023-03-10T02:35:16Z

Hi, there. I strongly agree with @alexandrnikitin. It kills time because I have to retry the whole job if the codecov action fails.

[2023-03-09T18:01:33.255Z] ['error'] There was an error running the uploader: Error uploading to [https://codecov.io:](https://codecov.io/) Error: There was an error fetching the storage URL during POST: 404 - {'detail': ErrorDetail(string='Unable to locate build via Github Actions API. Please upload with the Codecov repository upload token to resolve issue.', code='not_found')}
[2023-03-09T18:01:33.256Z] ['verbose'] The error stack is: Error: Error uploading to [https://codecov.io:](https://codecov.io/) Error: There was an error fetching the storage URL during POST: 404 - {'detail': ErrorDetail(string='Unable to locate build via Github Actions API. Please upload with the Codecov repository upload token to resolve issue.', code='not_found')}
    at main (/snapshot/repo/dist/src/index.js)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
[2023-03-09T18:01:33.256Z] ['verbose'] End of uploader: 1829 milliseconds

ahukkanen · 2023-03-13T19:29:24Z

This would be very helpful.

We fixed the initial problem "Unable to locate build via Github Actions API." using some of the suggestions in the several different dscussions.

It has been running OK for few weeks now but now we started to see different errors, such as:

[2023-03-13T18:04:08.821Z] ['info'] Pinging Codecov: https://codecov.io/upload/v4?package=github-action-3.1.1-uploader-0.3.5&token=*******&branch=fix%2F10518&build=4407915657&build_url=https%3A%2F%2Fgithub.com%2Fdecidim%2Fdecidim%2Factions%2Fruns%2F4407915657&commit=538d19c980fa26abebbdb736c28488a81c69ac8a&job=%5BCI%5D+Meetings+%28unit+tests%29&pr=10519&service=github-actions&slug=decidim%2Fdecidim&name=decidim-meetings&tag=&flags=decidim-meetings&parent=
[2023-03-13T18:04:27.515Z] ['error'] There was an error running the uploader: Error uploading to [https://codecov.io:](https://codecov.io/) Error: There was an error fetching the storage URL during POST: 500 - {"error": "Server Error (500)"}

And

[2023-03-13T18:16:44.977Z] ['info'] Pinging Codecov: https://codecov.io/upload/v4?package=github-action-3.1.1-uploader-0.3.5&token=*******&branch=fix%2F10518&build=4407915631&build_url=https%3A%2F%2Fgithub.com%2Fdecidim%2Fdecidim%2Factions%2Fruns%2F4407915631&commit=538d19c980fa26abebbdb736c28488a81c69ac8a&job=%5BCI%5D+Meetings+%28system+public%29&pr=10519&service=github-actions&slug=decidim%2Fdecidim&name=decidim-meetings-system-public&tag=&flags=decidim-meetings-system-public&parent=
[2023-03-13T18:17:15.139Z] ['error'] There was an error running the uploader: Error uploading to [https://codecov.io:](https://codecov.io/) HeadersTimeoutError: Headers Timeout Error

It would be really helpful if the codecov action waited few seconds and retried so that we don't have to rerun the whole action which can take up to 30 mins (depending on the workflow).

pavolloffay · 2023-03-16T13:21:43Z

+1

Licenser · 2023-03-30T12:21:01Z

Yes please, we've this issue fairly constantly and it's incredibly annoying.

Licenser · 2023-03-31T07:52:48Z

To put this in perspective, all those PR's failed because the codecov upload failed. This means we got to re-run the jobs and pay for minutes again :( this issue is really painful

frenck · 2023-03-31T13:59:30Z

We have been experiencing a lot of similar issues as described above.
The amount of jobs that fail is really getting annoying, to the state that reviewers aren't even bothering with restarting the CI.

We've limited runtime for the codecov jobs to prevent them from running for hours and exhausting our CI runners. On non-open source projects, this can be quite costly when GitHub bills the org.

Anything we can provide to resolve this issue?

../Frenck

epenet · 2023-04-05T13:36:42Z

Just had a similar issue, this time with error code 502.
https://github.com/home-assistant/core/actions/runs/4618964416/jobs/8167147703

[2023-04-05T13:27:00.542Z] ['error'] There was an error running the uploader: Error uploading to https://codecov.io: Error: There was an error fetching the storage URL during POST: 502 - 
<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>502 Server Error</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h1>Error: Server Error</h1>
<h2>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.</h2>
<h2></h2>
</body></html>

pmalek · 2023-04-06T08:35:36Z

We're also seeing the above mentioned 502s.

What we ended up doing is using a retry mechanism like https://github.com/Wandalen/wretry.action to retry the upload. fail_ci_if_error set to false is not an option really if someone cares about the coverage reports.

Stael · 2023-04-06T16:16:54Z

Facing the same issue. Would love a retry ❤️

adiroiban · 2023-04-23T19:32:34Z

A retry would be awesome.

Here is another failure on GitHub Actions - https://github.com/twisted/twisted/actions/runs/4780033926/jobs/8497499137?pr=11845#step:13:42

[2023-04-23T19:28:07.607Z] ['error'] There was an error running the uploader:
Error uploading to [https://codecov.io:](https://codecov.io/) Error: getaddrinfo EAI_AGAIN codecov.io

Seeing a ton of HTTP 502 and other errors on codecov uploads. this should be breaking CI. see also: codecov/codecov-action#926

citizen-stig · 2023-06-08T08:35:26Z

This will be such a game changer in CI experience.

If CI is periodically fails because of reasons like network error, people tend to ignore other failures.

GCHQDeveloper314 · 2023-06-08T13:28:01Z

We've had the following (same problem as @LucasXu0 reported above) which prevented the upload from working.

[2023-06-05T13:59:02.657Z] ['error'] There was an error running the uploader: Error uploading to [https://codecov.io:](https://codecov.io/) Error: There was an error fetching the storage URL during POST: 404 - {'detail': ErrorDetail(string='Unable to locate build via Github Actions API. Please upload with the Codecov repository upload token to resolve issue.', code='not_found')}

This looks to have been caused by a temporary GitHub API outage, but because we don't have fail_ci_if_error enabled the coverage on our main branch became incorrect as only a portion of the required coverage data was uploaded.

I would suggest a new optional argument for codecov-action allowing a given number of retries and an inter-retry cooldown to be specified.

As a workaround, instead of performing the coverage upload as part of the same job as the build & test this can be split out into a separate job. The upload-artifact action could be used to store the raw coverage data as an artifact which a later codecov job would retrieve and upload.
If the codecov upload failed then all that would need to be rerun is the failed codecov job. This job would be just the upload, so it would avoid rerunning any build/test, saving many GitHub runner minutes.

eivindjahren · 2023-06-15T13:54:12Z

We have exactly the same behavior in https://github.com/equinor/ert and an option to retry on connection failures would be awesome.

ssbarnea · 2024-02-26T17:14:34Z

It seems that the need for a exponential backoff automatic retry is more urgent these days. I seen

The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.

This is as clear as it sounds and implementing retry in python is not really hard.

tomage · 2024-04-16T20:11:51Z

+1 - wastes a lot of time if upload fails, and the entire action must be re-run (usually involves running all unit-tests)

eivindjahren · 2024-04-17T05:10:35Z

I have attempted to add a 30 second sleep and retry and it simply isn't enough. If a retry is to be added, it needs to be more than that to work consistently.

ReenigneArcher · 2024-04-17T13:08:20Z

I have attempted to add a 30 second sleep and retry and it simply isn't enough. If a retry is to be added, it needs to be more than that to work consistently.

In v4 you get a more detailed error message. But basically tokenless uploads are failing more often due to GitHub api limits.

error - 2024-04-16 13:51:14,366 -- Commit creating failed: {"detail":"Tokenless has reached GitHub rate limit. Please upload using a token: https://docs.codecov.com/docs/adding-the-codecov-token. Expected available in 459 seconds."}

It seems like the action uses a central codecov owned GitHub API token. That is likely because the built in GITHUB_TOKEN doesn't have access to the events scope https://docs.github.com/en/actions/security-guides/automatic-token-authentication#permissions-for-the-github_token and using a GitHub app (https://docs.github.com/en/actions/security-guides/automatic-token-authentication#granting-additional-permissions) still wouldn't work for fork PRs as far as I understand.

In any event the information required for a reliable retry is already available in the logs. In my example waiting ~8 minutes is better than rebuilding the project from zero which can sometimes take ~30 minutes. Even just avoiding the manual "re-run" click would be worth it.

Dreamsorcerer · 2024-04-17T18:55:42Z

But basically tokenless uploads are failing more often due to GitHub api limits.

Given that the latest version requires a token, this is not the issue that most people are reporting here, and possibly not worth the extra work to extract the retry time from the message. The primary issue is with Codecov's servers themselves, which occasionally fail to accept an upload. As shown above (#926 (comment)) this usually suggests retrying after around 30 seconds. This issue is just asking Codecov to follow the advice from their own server.

From time to time, CI fails because of codecov being too slow to process uploads. This has become more and more frequent as of late. Reference: codecov/codecov-action#926 Proposed workaround: wrap the action with a retry. 30s seems the delay recommended by codecov. Signed-off-by: Frederic BIDON <[email protected]>

eivindjahren · 2024-05-27T10:15:14Z

So I believe this behavior recently changed a bit. You now get the following if you use forks:

info - 2024-05-27 07:27:20,004 -- ci service found: github-actions
info - 2024-05-27 07:27:20,294 -- The PR is happening in a forked repo. Using tokenless upload.
info - 2024-05-27 07:27:20,478 -- Process Commit creating complete
error - 2024-05-27 07:27:20,479 -- Commit creating failed: {"detail":"Tokenless has reached GitHub rate limit. Please upload using a token: https://docs.codecov.com/docs/adding-the-codecov-token. Expected available in 393 seconds."}

Note that the link is broken, and should really point to this: https://docs.codecov.com/docs/codecov-uploader#supporting-token-less-uploads-for-forks-of-open-source-repos-using-codecov . Turns out that codecov has a shared pool of github resources that gets rate limited. So I would suggest that if you have retry logic implemented, please be considerate about using those shared resources. Also, if codecov could give some guidance in how to avoid using tokenless upload in case of forked repo workflow then that would be great.

https://docs.codecov.com/docs/codecov-uploader#supporting-token-less-uploads-for-forks-of-open-source-repos-using-codecov

ReenigneArcher · 2024-05-27T13:36:03Z

Since they provide the expected time now (Expected available in 393 seconds.), maybe they could handle the retry logic using the time provided in their failure response.

Maybe they could allow us to use a public upload token for use in PRs, which would only have permission to add coverage information for repos/branches which aren't the origin one.

fix #32235 inspired by https://github.com/Kong/kubernetes-testing-framework/blob/main/.github/workflows/tests.yaml#L53-L69 ### What is changing: Introducing the [wretry.action](https://github.com/Wandalen/wretry.action) to automatically retry the [codecov-action](https://github.com/codecov/codecov-action) in our CI/CD pipeline. This change is intended to enhance the robustness of our workflow by mitigating transient failures in the codecov-action. ### Follow-up changes needed: This retry mechanism will be removed once the issue [codecov/codecov-action#926](codecov/codecov-action#926) is resolved or if GitHub Actions provides a built-in retry feature for actions. Signed-off-by: Liang Huang <[email protected]>

thomasrockhu-codecov · 2024-06-18T16:43:17Z

@ReenigneArcher thanks for your message. Initially, we tried to do retries after the expected time. However, since this is a blocking call, CI runs could potentially run for hours if they missed the window to upload.

That said, we are making changes to our system to decrease the number of GitHub API calls which will hopefully alleviate some of this pain.

Also, I am looking into adding retries as a feature to the Action. However, this may be slated for later next quarter.

Dreamsorcerer · 2024-06-18T18:30:28Z

However, since this is a blocking call, CI runs could potentially run for hours if they missed the window to upload.

That's a fair concern, but in most cases (all that I've seen?), the retry can happen after 30 seconds or so. While restarting the CI process (for many of us) takes more like 15+ mins and requires us manually rerunning it (versus it happening automatically without supervision).

The retry logic could be opt-in for those concerned that it might use too many minutes (though it should obviously also be capped at a sensible or configurable time limit).

eivindjahren · 2024-06-19T06:23:33Z

@Dreamsorcerer Yea, come to think of it. Something extremely wasteful seems to be happening. I just realized that triggering an upload of coverage data shouldn't consume anything from the github api! Is it fetching the commit every time a coverage data upload is happening? We upload 4 reports for each PR so 3 of those uploads should not need to interact with github.

It seems like you could have the github action upload whatever information you need to track and then go fetch what you need from github when it is requested as interacting with coverage data happens far less frequently than coverage report uploads.

ReenigneArcher · 2024-06-19T19:57:37Z

Also, if codecov is trying to use the events API, commits may not even appear there for up to 6 hours. I discovered that in another project of mine where I was using the events API.

https://docs.github.com/en/rest/activity/events?apiVersion=2022-11-28#list-repository-events

fix milvus-io#32235 inspired by https://github.com/Kong/kubernetes-testing-framework/blob/main/.github/workflows/tests.yaml#L53-L69 ### What is changing: Introducing the [wretry.action](https://github.com/Wandalen/wretry.action) to automatically retry the [codecov-action](https://github.com/codecov/codecov-action) in our CI/CD pipeline. This change is intended to enhance the robustness of our workflow by mitigating transient failures in the codecov-action. ### Follow-up changes needed: This retry mechanism will be removed once the issue [codecov/codecov-action#926](codecov/codecov-action#926) is resolved or if GitHub Actions provides a built-in retry feature for actions. Signed-off-by: Liang Huang <[email protected]>

Dreamsorcerer · 2024-09-20T17:35:58Z

Seems there is a retry in there now, but it happens too fast. Getting 503 errors today, but it seems to make 3 attempts in about 4 seconds:

warning - 2024-09-20 17:30:38,120 -- Response status code was 503. --- {"retry": 0}
warning - 2024-09-20 17:30:38,121 -- Request failed. Retrying --- {"retry": 0}
warning - 2024-09-20 17:30:41,107 -- Response status code was 503. --- {"retry": 1}
warning - 2024-09-20 17:30:41,107 -- Request failed. Retrying --- {"retry": 1}
warning - 2024-09-20 17:30:42,337 -- Response status code was 503. --- {"retry": 2}
warning - 2024-09-20 17:30:42,337 -- Request failed. Retrying --- {"retry": 2}

Seems like it should spread out the retries over a minute or two..

tagatac · 2024-09-20T19:35:49Z

Ah, cool. This is progress. It looks like the short backoff is an intentional decision:

Being an iteractive tool I don't think we should use too big of a backoff period

Originally posted by @giovanni-guidini in codecov/codecov-cli#210 (comment)

I'll just link some of the related PRs in case any of the authors or reviewers have any opinions about increasing the backoff:

@giovanni-guidini @scott-codecov @joseph-sentry @adrian-codecov

Dreamsorcerer · 2024-09-20T20:10:46Z

Well, as mentioned above, the correct time to retry at is included in the server response, so ideally it should just retry at that time (usually about 30 seconds).

thomasrockhu-codecov · 2024-09-20T21:19:18Z

@Dreamsorcerer to note, this is not always 30 seconds. In fact it is often closer to 1 hour. This caused issues before where retries would block CI for over 6 hours. We are doing other improvements so that we won't have to block for an hour.

Dreamsorcerer · 2024-09-20T22:28:47Z

If it's more than a couple of minutes, obviously don't retry. However, 100% of the times I've seen the CI fail due to this error, it has worked after rerunning the CI which takes a few minutes, so I can't say I've ever seen such a long delay.

ahukkanen mentioned this issue Mar 13, 2023

Code coverage uploads fail occasionally at the CI decidim/decidim#9832

Open

pavolloffay mentioned this issue Mar 16, 2023

Do not fail CI if codecov step fails grafana/tempo-operator#317

Merged

ahukkanen mentioned this issue Mar 16, 2023

Update the setup-node GitHub action to v3 decidim/decidim#10567

Merged

eytanadler mentioned this issue Mar 31, 2023

Don't merge! mdolab/OpenAeroStruct#403

Closed

13 tasks

This was referenced Mar 31, 2023

Update orjson to 3.8.9 home-assistant/core#90570

Merged

Add CI timeout to codecov job home-assistant/core#90572

Merged

pmalek mentioned this issue Apr 7, 2023

chore(ci): bump codecov retry limit to 10 Kong/kubernetes-testing-framework#613

Merged

tdonohue mentioned this issue Apr 14, 2023

Split Codecov GitHub action to separate job & add retries. Update other actions. DSpace/DSpace#8780

Merged

This was referenced Apr 18, 2023

Bump reolink-aio to v0.5.13 home-assistant/core#91550

Merged

CI: Add automatic retries to codecov upload action home-assistant/core#91857

Merged

asherf added a commit to asherf/python-jose that referenced this issue May 3, 2023

Don't fail CI on codecov upload errors

76a4263

Seeing a ton of HTTP 502 and other errors on codecov uploads. this should be breaking CI. see also: codecov/codecov-action#926

asherf mentioned this issue May 3, 2023

Don't fail CI on codecov upload errors mpdavis/python-jose#318

Merged

asherf added a commit to mpdavis/python-jose that referenced this issue May 3, 2023

Don't fail CI on codecov upload errors (#318)

c7581b8

Seeing a ton of HTTP 502 and other errors on codecov uploads. this should be breaking CI. see also: codecov/codecov-action#926

codeboten mentioned this issue Jun 2, 2023

[chore] enable coverage for unit tests open-telemetry/opentelemetry-collector-contrib#22966

Merged

enrichman mentioned this issue Jun 7, 2023

Fixed passing CI when codecov upload fails epinio/epinio#2369

Merged

citizen-stig mentioned this issue Jun 8, 2023

Enable Cargo Clippy in CI Sovereign-Labs/sovereign-sdk#399

Merged

LucasXu0 mentioned this issue Jun 16, 2023

[FR] retry codecov action AppFlowy-IO/AppFlowy#2822

Closed

xuezhaojun mentioned this issue Apr 7, 2024

🐛 CI should pass when on codecov upload fail. open-cluster-management-io/ocm#396

Merged

zhujian7 mentioned this issue Apr 18, 2024

🌱 CI should pass when on codecov upload fail open-cluster-management-io/managed-serviceaccount#103

Merged

AnnaShaleva mentioned this issue Apr 24, 2024

workflows: do not run CHANGELOG check if not needed nspcc-dev/dbft#115

Merged

ssbarnea mentioned this issue May 3, 2024

v4 keeps reporting HTTP Error 504 upstream request timeout very often #1409

Closed

bsipocz mentioned this issue May 3, 2024

CI: failure of coverage upload doesn't change job status astropy/astropy#16379

Open

fredbi mentioned this issue May 8, 2024

ci: added retries on codecov coverage report upload go-swagger/go-swagger#3108

Merged

yellow-shine mentioned this issue Jun 6, 2024

fix:[skip e2e] make more retries on codecov uploader milvus-io/milvus#33683

Merged

thomasrockhu-codecov self-assigned this Jun 18, 2024

thomasrockhu-codecov changed the title ~~Retry if upload fails~~ [CLI] Retry if upload fails Jan 17, 2025

thomasrockhu-codecov added the Medium Medium Priority Issues (to be fixed or re-evaluated in 3 months label Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CLI] Retry if upload fails #926

[CLI] Retry if upload fails #926

alexandrnikitin commented Feb 24, 2023

LucasXu0 commented Mar 10, 2023

ahukkanen commented Mar 13, 2023

pavolloffay commented Mar 16, 2023

Licenser commented Mar 30, 2023

Licenser commented Mar 31, 2023

frenck commented Mar 31, 2023

epenet commented Apr 5, 2023 •

edited

Loading

pmalek commented Apr 6, 2023

Stael commented Apr 6, 2023

adiroiban commented Apr 23, 2023

citizen-stig commented Jun 8, 2023

GCHQDeveloper314 commented Jun 8, 2023

eivindjahren commented Jun 15, 2023 •

edited

Loading

ssbarnea commented Feb 26, 2024

tomage commented Apr 16, 2024

eivindjahren commented Apr 17, 2024

ReenigneArcher commented Apr 17, 2024

Dreamsorcerer commented Apr 17, 2024

eivindjahren commented May 27, 2024 •

edited

Loading

ReenigneArcher commented May 27, 2024

thomasrockhu-codecov commented Jun 18, 2024

Dreamsorcerer commented Jun 18, 2024 •

edited

Loading

eivindjahren commented Jun 19, 2024

ReenigneArcher commented Jun 19, 2024

Dreamsorcerer commented Sep 20, 2024

tagatac commented Sep 20, 2024

Dreamsorcerer commented Sep 20, 2024

thomasrockhu-codecov commented Sep 20, 2024

Dreamsorcerer commented Sep 20, 2024

[CLI] Retry if upload fails #926

[CLI] Retry if upload fails #926

Comments

alexandrnikitin commented Feb 24, 2023

LucasXu0 commented Mar 10, 2023

ahukkanen commented Mar 13, 2023

pavolloffay commented Mar 16, 2023

Licenser commented Mar 30, 2023

Licenser commented Mar 31, 2023

frenck commented Mar 31, 2023

epenet commented Apr 5, 2023 • edited Loading

pmalek commented Apr 6, 2023

Stael commented Apr 6, 2023

adiroiban commented Apr 23, 2023

citizen-stig commented Jun 8, 2023

GCHQDeveloper314 commented Jun 8, 2023

eivindjahren commented Jun 15, 2023 • edited Loading

ssbarnea commented Feb 26, 2024

tomage commented Apr 16, 2024

eivindjahren commented Apr 17, 2024

ReenigneArcher commented Apr 17, 2024

Dreamsorcerer commented Apr 17, 2024

eivindjahren commented May 27, 2024 • edited Loading

ReenigneArcher commented May 27, 2024

thomasrockhu-codecov commented Jun 18, 2024

Dreamsorcerer commented Jun 18, 2024 • edited Loading

eivindjahren commented Jun 19, 2024

ReenigneArcher commented Jun 19, 2024

Dreamsorcerer commented Sep 20, 2024

tagatac commented Sep 20, 2024

Dreamsorcerer commented Sep 20, 2024

thomasrockhu-codecov commented Sep 20, 2024

Dreamsorcerer commented Sep 20, 2024

epenet commented Apr 5, 2023 •

edited

Loading

eivindjahren commented Jun 15, 2023 •

edited

Loading

eivindjahren commented May 27, 2024 •

edited

Loading

Dreamsorcerer commented Jun 18, 2024 •

edited

Loading