Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make staging / batch commits more easily accessible #768

Open
1 of 4 tasks
xmo-odoo opened this issue May 24, 2023 · 3 comments
Open
1 of 4 tasks

Make staging / batch commits more easily accessible #768

xmo-odoo opened this issue May 24, 2023 · 3 comments
Labels

Comments

@xmo-odoo
Copy link
Collaborator

xmo-odoo commented May 24, 2023

Smart idea from @ryv-odoo: expose as a git repository using submodules, this should be reasonably easy to generate, as it's only hashes there isn't really anything to compress, so we can create a naive packfile.

Submodules are not great, but they should be good enough for this use case.

  • migration to create commits (somewhat efficiently) for all old stagings / batches
  • creation of commits for stagings
    • async?
  • actually push to repository
@xmo-odoo
Copy link
Collaborator Author

Also needs a reverse index endpoint, so e.g. the runbot can pass in a commit (or a set of commits, most likely), and get the batched PRs from that staging.

With a second set of commits, provide all the batched PRs (per staging) from the original up to (but excluding) the second set.

This way the runbot can provide the commits of the mainline batches which it does have, and get references to all the PRs in the bucket.

xmo-odoo added a commit that referenced this issue Aug 11, 2023
Currently the heads of a staging (both staging heads and merged heads)
are just JSON data on the staging itself. Historically this was
convenient as the heads were mostly of use to the staging process, and
thus accessed directly through the staging essentially exclusively.

However this makes finding stagings from merged commits e.g. for
forensic research almost impossible, because querying based on
the *values* of a JSON map is expensive, and indexing it is difficult.

To make this use case more feasible, split the `heads` field into two
join tables, one for the staging heads and one for the merged heads,
this makes looking for stagings by commits much more
efficient (although the queries may not be trivial). Also add two
utility RPC methods, so that it's possible to query stagings
reasonably easily and efficiently based on a set of commits (branch
heads).

related to #768
xmo-odoo added a commit that referenced this issue Aug 11, 2023
`/runbot_merge/stagings`
========================

This endpoint is a reverse lookup from any number of commits to a
(number of) staging(s):

- it takes a list of commit hashes as either the `commits` or the
  `heads` keyword parameter
- it then returns the stagings which have *all* these commits as
  respectively commits or heads, if providing all commits for a
  project the result should always be unique (if any)
- `commits` are the merged commits, aka the stuff which ends up in the
  actual branches
- `heads` are the staging heads, aka the commits at the tip of the
  `staging.$name` branches, those may be the same as the corresponding
  commit, or might be deduplicator commits which get discarded on
  success

`/runbot_merge/stagings/:id`
============================

Returns a list of all PRs in the staging, grouped by batch (aka PRs
which have the same label and must be merged together).

For each PR, the `repository` name, `number`, and `name` in the form
`$repository#$number` get returned.

`/runbot_merge/stagings/:id1/:id2`
==================================

Returns a list of all the *successfully merged* stagings between `id1`
and `id2`, from oldest to most recent. Individual records have the
form:

- `staging` is the id of the staging
- `prs` is the contents of the previous endpoint (a list of PRs
  grouped by batch)

`id1` *must* be lower than `id2`.

By default, this endpoint is inclusive on both ends, the
`include_from` and / or `include_to` parameters can be passed with the
`False` value to exclude the corresponding bound from the result.

Related to #768
@xmo-odoo
Copy link
Collaborator Author

xmo-odoo commented Aug 11, 2023

On the back burner after implementing (and soon deploying) the reverse index bits for the runbot: generating a git repo / pack on the fly works but it's super slow because of the number of stagings & batches involved.

This means the git data needs to be pregenerated and ready to be packed, even more so given the expected need to update working copies (so it needs to gracefully handle partial retrievals). This means the git data needs to be pregenerated with store snapshots & co, whether that's stored in db or on disk.

@xmo-odoo
Copy link
Collaborator Author

xmo-odoo commented Aug 5, 2024

New update: a basic wip script seems to process 84589 stagings in ~35 minutes.

That is lacking batches support, and eyeballing it there seems to be an average of around two batches per successful staging, so I assume the processing time will double, at least.

CPU% is only at 45% though, since most of the work is querying postgres and creating git objects, I expect it should be possible to create a thread per target.

xmo-odoo added a commit that referenced this issue Dec 16, 2024
Add experimental support for creating submodule-based commits for
stagings (and batches), and pushing those in ancillary repositories.

Fixes #768
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: accepted
Development

No branches or pull requests

1 participant