Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hitting API Rate limit fails ungracefully #19

Closed
edmundmiller opened this issue Aug 22, 2023 · 51 comments · Fixed by nf-core/modules#3795
Closed

Hitting API Rate limit fails ungracefully #19

edmundmiller opened this issue Aug 22, 2023 · 51 comments · Fixed by nf-core/modules#3795
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed
Milestone

Comments

@edmundmiller
Copy link
Collaborator

 Error: Could not retrieve Nextflow release matching latest-everything.
API rate limit exceeded for installation ID 1565271.
Error: Could not parse the download URL
Cannot read properties of undefined (reading 'filter')
Error: versionSpec parameter is required
Warning: Nextflow appears to have installed correctly, but an error was thrown while running it.

Example run
logs_25031.zip

@edmundmiller edmundmiller self-assigned this Aug 22, 2023
@edmundmiller edmundmiller changed the title Hitting API Rate limit fails un-gracefully Hitting API Rate limit fails ungracefully Aug 22, 2023
@edmundmiller
Copy link
Collaborator Author

Basically, I think we can see if it returns that error, and then pause and retry.

@MillironX
Copy link
Member

I like that idea. The current limit is 1000 requests per hour (source), but I wonder if we can figure out the time to wait from the response headers as shown here.

@edmundmiller
Copy link
Collaborator Author

Oh no, does that consider hitting the cache API? I feel like Sarek could hit that really quickly... Or worse modules during the hackathon!

@edmundmiller edmundmiller added bug Something isn't working help wanted Extra attention is needed labels Aug 30, 2023
@edmundmiller
Copy link
Collaborator Author

@adamrtalbot Is hitting this in sarek with the tests. My question is why are we not hitting it in modules then? I haven't heard anyone complain about it.

@ewels had a good idea to set up a fallback on the website, either an API end-point on the nf-core site that hits the GitHub API, or during build pull the versions.

@edmundmiller edmundmiller linked a pull request Aug 31, 2023 that will close this issue
@edmundmiller
Copy link
Collaborator Author

I think I'm getting to the bottom of it.

Sarek is trying to cache it in the workflow.

But the setup action has a cache inside it.

@edmundmiller
Copy link
Collaborator Author

My question is why are we not hitting it in modules then?

Because it's not using setup-nextflow 🫠

@edmundmiller
Copy link
Collaborator Author

Might get some ideas from https://github.com/DeterminateSystems/magic-nix-cache-action.

Since, we're talking caches, I wonder if we can cache work/ between workflows...

@MillironX
Copy link
Member

As of v1.3, the workflow runs it's own cache. Sarek is pinned at v1.2 for some reason. Updating that should relieve about 2/3 of the API hits. The caching present in Sarek's workflow does nothing b/c the Nextflow binary doesn't live at /usr/bin/nextflow when installed using setup-nextflow.

Personally, I would love to see all nf-core testing workflows updated to use one install per repo per version, and then all tests are run off of that. I'm really inspired by Julia's testing workflow template. Unfortunately, Nextflow doesn't have as nice of a testing framework as Julia, so that may be a pipe dream.

edmundmiller added a commit to nf-core/sarek that referenced this issue Aug 31, 2023
@edmundmiller
Copy link
Collaborator Author

Personally, I would love to see all nf-core testing workflows updated to use one install per repo per version, and then all tests are run off of that. I'm really inspired by Julia's testing workflow template. Unfortunately, Nextflow doesn't have as nice of a testing framework as Julia, so that may be a pipe dream.

That's nice. Do you not think that would work with nf-test?

That also makes me think we could move to a repo and pull in the GitHub actions like this repo https://github.com/magit/actions So it would just be:

steps:
    steps:
      - name: Install Nextflow
        uses: nf-core/actions/setup-nextflow@main
      - name: Install nf-core
        uses: nf-core/actions/setup-nf-core@main
      - name: Run nf-test
        uses: nf-core/actions/nf-test@main

@edmundmiller
Copy link
Collaborator Author

Not sure if this is over yet...

@MatthiasZepper
Copy link
Member

How about checking the API rate limit first before attempting to query the endpoints?

Queries to the limit endpoint are free (no effects on the remaining quota) and one could probably reuse code from my action step.

@MillironX
Copy link
Member

How about checking the API rate limit first before attempting to query the endpoints?

We have talked about this.

I wonder if we can figure out the time to wait from the response headers as shown here.
#19 (comment)

I'm working on a big update that should make this easier to integrate into the code...

@MillironX MillironX mentioned this issue Oct 30, 2023
4 tasks
@edmundmiller edmundmiller added this to the 2.0.0 milestone Oct 31, 2023
@edmundmiller
Copy link
Collaborator Author

edmundmiller commented Nov 29, 2023

Conversation with @ewels:

  1. Hard-code version in modules again. We'd be on edge except for the month after stable releases
  2. Hard-code the response in this action
  3. Make a API endpoint on the nf-core website that serves a latest-everything Nextflow script(Basically this action in an API endpoint) that we can cache on the website build.

We're favoring option 3, this action could be extremeley simple then. (I think it should stay because it gives people the choice of the super simple uses: setup-nextflow

@MillironX
Copy link
Member

MillironX commented Nov 29, 2023

3. Make a API endpoint on the nf-core website that serves a `latest-everything` Nextflow script(Basically this action in an API endpoint) that we can cache on the website build.

I've thought about something similar. Does anyone who works at Seqera (looking at you, @ewels) know how the Nextflow.io website catalogs versions? My digging through the script has led me to believe that Nextflow.io mirrors the Github release metadata in some way. Maybe they already have an API that they would let us ping instead of using the Github API?

@edmundmiller
Copy link
Collaborator Author

edmundmiller commented Jan 9, 2024

More ideas from the PRs and issues that are linked here actions/setup-go#16

I think we need to add the GitHub token to the README.

Like this:
https://github.com/DeLaGuardo/setup-clojure/blob/4b9b2f7702139e6be8ee376e8b2bef8f7f6850a4/action.yml#L36-L39

@ewels
Copy link
Member

ewels commented Jan 9, 2024

@MillironX - I would prefer not to rely on nextflow.io if possible. The current build / back end is somewhat opaque and I'm hoping that we will do a major revamp any day now (I've been hoping that for quite a while 😅 ).

Working with the nf-core website should be pretty trivial and we have full transparency and (community-) control over that site. Also given that https://github.com/nf-core/setup-nextflow is under the @nf-core GitHub org, it feels more natural to me anyway.

@ewels
Copy link
Member

ewels commented Jan 11, 2024

Great 👍🏻 Also, the website is already building a minimum of once per day. So no need to get clever with caches or anything like that, just having a static endpoint that grabs the latest version from the GitHub API at build time should be perfectly sufficient.

@mashehu - we need to return:

  • most recent edge release
  • most recent stable release
  • either: dates for each and / or the most recent release out of the two above

Suggestion would be to use keys that match the GitHub action terminology:

  • latest-stable
  • latest-edge
  • latest-everything

eg:

{
  "latest-stable": "v23.10.0",
  "latest-edge": "v23.12.0-edge",
  "latest-everything": "v23.12.0-edge"
}

Could bung in extra data if you like, but those are the three that will be used here I think.

Could also put in a top-level key to future-proof, so it's easier to add more stuff in the future to use the endpoint for other things.

@ewels
Copy link
Member

ewels commented Jan 11, 2024

No, they fetch a non-existing page. They purposefully get a 404 error, but extract the version number due to the automatic redirect implemented by GitHub.

Ooh, clever!

@MillironX
Copy link
Member

we need to return:

  • most recent edge release
  • most recent stable release
  • either: dates for each and / or the most recent release out of the two above

Why not simply have all releases as a JSON file? That's what the Julia folks do. If we hard-code the "latest" versions into the nf-core website, then this action will just hit API limits when trying to resolve a numbered version. My idea of the workflow was

  1. Have the nf-core website crawl all releases on GitHub once per day. From the boatload of info that Octokit provides, the nf-core website will create a static JSON file that contains (for each Nextflow release)
    1. The version number
    2. Whether that version is an edge release or not
    3. The download URL for the regular script file
    4. The download URL for the "all" variant script file
    5. SHA hashes to verify the integrity of scripts?
  2. Point this action to the JSON file on the nf-core website rather than Octokit

Our source of truth remains the same that way, but we're effectively caching that info on the nf-core website. The logic of what is the "most recent" remains in this action, as well as all the version resolving code for numbered versions.

@edmundmiller
Copy link
Collaborator Author

We could have two seperate end-points

{
  "latest-stable": "v23.10.0",
  "latest-edge": "v23.12.0-edge",
  "latest-everything": "v23.12.0-edge"
}

and

{ 
  "v23.10.0",
  "v23.11.0",
  "v23.12.0-edge",
  ...
}

Or combine them

{ 
  "latest-stable": "v23.10.0",
  "latest-edge": "v23.12.0-edge",
  "latest-everything": "v23.12.0-edge",
  versions: [
  "v23.10.0",
  "v23.11.0",
  "v23.12.0-edge",
  ...
  ]
}

@mashehu
Copy link
Contributor

mashehu commented Jan 12, 2024

here you go: https://deploy-preview-2173--nf-core.netlify.app/nf_version (somehow the conten-type is ignored, but it's json)

@ewels
Copy link
Member

ewels commented Jan 12, 2024

  • I'd like dates as well, if possible
  • I still think it'd be nice to return whether a release is the latest or not, otherwise anything else using it has to figure this out themselves.

If we hard-code the "latest" versions into the nf-core website, then this action will just hit API limits when trying to resolve a numbered version.

I don't really understand this. The nf-core website would only be doing 2 queries a day, so the number of API requests should go right down, so we should be fine on the API limit?

@MillironX
Copy link
Member

The nf-core website would only be doing 2 queries a day, so the number of API requests should go right down, so we should be fine on the API limit?

The website will only do two API requests per day, but let's say we implement a a design that only exposes the "latest" versions on the website. Every time setup-nextflow is run using a "latest" version, setup-nextflow will make zero API calls to GitHub. Excellent. That's what we want. But now if I run setup-nextflow with a numbered version, say "23.10", the website file wouldn't contain any information on what v23.10 looks like, so now setup-nextflow would need to crawl the GitHub API again, and we'll hit our rate limits and be back where we started.

In fact, as setup-nextflow is currently written, "latest-stable" results in the fewest number of API calls, because it can leverage GitHub's "latest release" API to immediately get the download information rather than making two calls to see if a release is latest, and then getting the download information. In other words, the rate limiting is most important for numbered versions, and not "latest" versions.

@MillironX
Copy link
Member

MillironX commented Jan 12, 2024

I think @edmundmiller is on the right track about combining "latest" and numbered releases into a single JSON output. My ideal schema would look something like this:

{
  "latest": {
    "stable": {
      "version": "v23.10.0",
      "isEdge": false,
      "downloadUrl": "https://github.com/nextflow-io/nextflow/releases/download/v23.10.0/nextflow",
      "downloadUrlAll": "https://github.com/nextflow-io/nextflow/releases/download/v23.10.0/nextflow-23.10.0-all"
    },
    "edge": {
      "version": "v23.12.0-edge",
      "isEdge": true,
      "downloadUrl": "https://github.com/nextflow-io/nextflow/releases/download/v23.12.0-edge/nextflow",
      "downloadUrlAll": "https://github.com/nextflow-io/nextflow/releases/download/v23.12.0-edge/nextflow-23.12.0-edge-all"
    },
    "everything": {
      "version": "v23.12.0-edge",
      "isEdge": true,
      "downloadUrl": "https://github.com/nextflow-io/nextflow/releases/download/v23.12.0-edge/nextflow",
      "downloadUrlAll": "https://github.com/nextflow-io/nextflow/releases/download/v23.12.0-edge/nextflow-23.12.0-edge-all"
    }
  },
  "versions": [
    {
      "version": "v23.12.0-edge",
      "isEdge": true,
      "downloadUrl": "https://github.com/nextflow-io/nextflow/releases/download/v23.12.0-edge/nextflow",
      "downloadUrlAll": "https://github.com/nextflow-io/nextflow/releases/download/v23.12.0-edge/nextflow-23.12.0-edge-all"
    },
   { "etc, etc, etc": "" }
  ]
}

Yes, I know it's slightly redundant, but I think this maximizes compatibility with current systems while opening up new optimization opportunities.

@ewels
Copy link
Member

ewels commented Jan 12, 2024

But if a specific version is known then the URL can just be guessed, no? No API requests needed:

https://github.com/nextflow-io/nextflow/releases/download/{{ version }}/nextflow

That being said, I also have nothing against listing all the other versions - we're fetching that anyway and it's no extra effort. Very similar to what we're doing with https://nf-co.re/pipelines.json really.

@MillironX
Copy link
Member

MillironX commented Jan 12, 2024

But if a specific version is known then the URL can just be guessed, no? No API requests needed:

If, and only if a user specifies the explicit version number, e.g. 23.10.0 instead of 23.10. Maybe I'm thinking of this from too much of a semver perspective, but I like to specify year and month versions, then let the version resolve to the most bugfixed version that corresponds to that combo, e.g. let 23.04 resolve to getting 23.04.4. Plus, it's nice to try and resolve versions against an explicit list, removing a possible meaning of 404 errors.

@mashehu
Copy link
Contributor

mashehu commented Jan 15, 2024

Okay, added the most recent suggestion:
https://deploy-preview-2173--nf-core.netlify.app/nf_version

@ewels
Copy link
Member

ewels commented Jan 15, 2024

Looks good! I was thrown for a moment as I thought that the edge release should be the most recent, but I see that there was a patch on 23.10.1 a few days ago.. 😅

@mashehu
Copy link
Contributor

mashehu commented Jan 17, 2024

The above pr is merged, so you can use https://nf-co.re/nf_version

@ewels
Copy link
Member

ewels commented Jan 17, 2024

Only just noticed the URL. What do you think about making it https://nf-co.re/nextflow_version so that it's absolutely clear?

There are too many nfs around for my liking generally 😆

@MillironX
Copy link
Member

@mashehu, I notice the versions only go back to 22.11-edge. I'm guessing that these are only the first page of results? https://docs.github.com/en/rest/using-the-rest-api/using-pagination-in-the-rest-api?apiVersion=2022-11-28

@mashehu
Copy link
Contributor

mashehu commented Jan 19, 2024

Okay, https://nf-co.re/nextflow_version returns now ALL versions

@MillironX
Copy link
Member

Beautiful. Now for one final nitpick. @ewels noticed that the "latest-everything" tag returns a stable patch release currently. I'm not convinced that's the desired behavior. Here's my reasoning:

The "edge" releases contain (nearly) all commits approved on the master branch, including new features and hotfix patches. A stable release may contain new features if it is a major or minor (year or month) version bump, but will only contain hotfixes if it is a patch version bump. The goal of testing against "latest-everything" is to test against the most new features and hotfixes. So, the latest "edge" release would contain all the hotfixes of a patch release, plus new features, while the patch release will remove features that had been tested against on the previous "edge" release, only adding hotfixes. In other words, basing "latest-everything" on dates might exclude the latest features that might be breaking, going against the point of using the "latest-everything."

This is why I wrote setup-nextflow to use version comparison to find the "latest" versions.

If there is something wrong with my reasoning, or if you all think it is not worth the distinction, then we will just keep the new API as is.

@MillironX
Copy link
Member

Vote

👍 for making the "latest-everything" version resolve using version numbers (e.g. v23.12.0-edge > v23.10.1)

👎 for making the "latest-everything" version resolve using dates (e.g. v23.10.1 > v23.12.0-edge)

@ewels
Copy link
Member

ewels commented Jan 19, 2024

Good point 👍🏻 As long as it returns the new stable release when it's ahead it the most recent edge release, I'm happy (which it should do, based on version number matching).

@mashehu
Copy link
Contributor

mashehu commented Jan 21, 2024

okay, new sorting is live: nf-core/website#2216 thanks for pointing this out!

@MillironX
Copy link
Member

Thanks so much, @mashehu. Now to see about swapping out the API calls here 🤔

@adamrtalbot
Copy link

So do we (pipeline developers) need to do anything? do we 'just' need to update this action when the new feature is ready?

@mashehu
Copy link
Contributor

mashehu commented Jan 22, 2024

if you have it set as uses: nf-core/setup-nextflow@v1you are good

@mashehu
Copy link
Contributor

mashehu commented Jan 25, 2024

still happening here: https://github.com/nf-core/tools/actions/runs/7656119613/job/20863705436#step:8:18 or am I missing something?

@MillironX
Copy link
Member

I'm still working on the API swap out. ETA on that is a few months based on how much stuff needs to change. Interesting that #31 didn't take care of that, tho. Would you mind rerunning with runner logs and step logs turned on and sending me the logs?

@mashehu
Copy link
Contributor

mashehu commented Jan 25, 2024

Sure, next time I run into it.

@MillironX
Copy link
Member

Fixed by #37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants