-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make staging / batch commits more easily accessible #768
Comments
Also needs a reverse index endpoint, so e.g. the runbot can pass in a commit (or a set of commits, most likely), and get the batched PRs from that staging. With a second set of commits, provide all the batched PRs (per staging) from the original up to (but excluding) the second set. This way the runbot can provide the commits of the mainline batches which it does have, and get references to all the PRs in the bucket. |
Currently the heads of a staging (both staging heads and merged heads) are just JSON data on the staging itself. Historically this was convenient as the heads were mostly of use to the staging process, and thus accessed directly through the staging essentially exclusively. However this makes finding stagings from merged commits e.g. for forensic research almost impossible, because querying based on the *values* of a JSON map is expensive, and indexing it is difficult. To make this use case more feasible, split the `heads` field into two join tables, one for the staging heads and one for the merged heads, this makes looking for stagings by commits much more efficient (although the queries may not be trivial). Also add two utility RPC methods, so that it's possible to query stagings reasonably easily and efficiently based on a set of commits (branch heads). related to #768
`/runbot_merge/stagings` ======================== This endpoint is a reverse lookup from any number of commits to a (number of) staging(s): - it takes a list of commit hashes as either the `commits` or the `heads` keyword parameter - it then returns the stagings which have *all* these commits as respectively commits or heads, if providing all commits for a project the result should always be unique (if any) - `commits` are the merged commits, aka the stuff which ends up in the actual branches - `heads` are the staging heads, aka the commits at the tip of the `staging.$name` branches, those may be the same as the corresponding commit, or might be deduplicator commits which get discarded on success `/runbot_merge/stagings/:id` ============================ Returns a list of all PRs in the staging, grouped by batch (aka PRs which have the same label and must be merged together). For each PR, the `repository` name, `number`, and `name` in the form `$repository#$number` get returned. `/runbot_merge/stagings/:id1/:id2` ================================== Returns a list of all the *successfully merged* stagings between `id1` and `id2`, from oldest to most recent. Individual records have the form: - `staging` is the id of the staging - `prs` is the contents of the previous endpoint (a list of PRs grouped by batch) `id1` *must* be lower than `id2`. By default, this endpoint is inclusive on both ends, the `include_from` and / or `include_to` parameters can be passed with the `False` value to exclude the corresponding bound from the result. Related to #768
On the back burner after implementing (and soon deploying) the reverse index bits for the runbot: generating a git repo / pack on the fly works but it's super slow because of the number of stagings & batches involved. This means the git data needs to be pregenerated and ready to be packed, even more so given the expected need to update working copies (so it needs to gracefully handle partial retrievals). This means the git data needs to be pregenerated with store snapshots & co, whether that's stored in db or on disk. |
New update: a basic wip script seems to process 84589 stagings in ~35 minutes. That is lacking batches support, and eyeballing it there seems to be an average of around two batches per successful staging, so I assume the processing time will double, at least. CPU% is only at 45% though, since most of the work is querying postgres and creating git objects, I expect it should be possible to create a thread per target. |
Add experimental support for creating submodule-based commits for stagings (and batches), and pushing those in ancillary repositories. Fixes #768
Smart idea from @ryv-odoo: expose as a git repository using submodules, this should be reasonably easy to generate, as it's only hashes there isn't really anything to compress, so we can create a naive packfile.
Submodules are not great, but they should be good enough for this use case.
The text was updated successfully, but these errors were encountered: