Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: automated testing of migration compatibility #4323

Open
conorsch opened this issue May 6, 2024 · 7 comments
Open

ci: automated testing of migration compatibility #4323

conorsch opened this issue May 6, 2024 · 7 comments
Assignees
Labels
A-CI/CD Relates to continuous integration & deployment of Penumbra A-upgrades Area: Relates to chain upgrades _P-high High priority _P-V1 Priority: slated for V1 release

Comments

@conorsch
Copy link
Contributor

conorsch commented May 6, 2024

Is your feature request related to a problem? Please describe.
When preparing a chain upgrade, manual testing of upgrades is an arduous process.

Describe the solution you'd like
We should have an integration test that runs a devnet based on the currently-active testnet, via the most recently released tag, runs smoke tests against it to generate txs, then stops the network, runs the migration, restarts the network, and reruns the smoket ests.

Describe alternatives you've considered
Alternatives are manual testing, which is both slow and error-prone. Longer-term we want a capable "sudo mode" for testing upgrades, which is tracked in #4265.

Additional context
We'll need to be careful about endpoint compatibility: we cannot run an older version of pd and run the most recent smoke tests against it, because the view server implementations will not be compatible. We can sidestep this by running the smoke tests from the tagged release. Ideally, we'd be able to swap out the path to binaries within the tests via an env var to make it a bit more to test prior versions, without rebuilding from source every time.

@conorsch conorsch added A-CI/CD Relates to continuous integration & deployment of Penumbra A-upgrades Area: Relates to chain upgrades labels May 6, 2024
@github-project-automation github-project-automation bot moved this to Backlog in Penumbra May 6, 2024
@github-actions github-actions bot added the needs-refinement unclear, incomplete, or stub issue that needs work label May 6, 2024
@conorsch
Copy link
Contributor Author

conorsch commented May 6, 2024

There's a spike on an overhaul of the smoke-test logic here #4324 which provides a nice shape to extend into migration-testing.

@conorsch conorsch self-assigned this May 6, 2024
@conorsch conorsch moved this from Backlog to In progress in Penumbra May 6, 2024
@hdevalence
Copy link
Member

We'll need to be careful about endpoint compatibility: we cannot run an older version of pd and run the most recent smoke tests against it, because the view server implementations will not be compatible.

Can you elaborate on this a little? Isn't our expectation that clients should work across upgrade boundaries?

@conorsch
Copy link
Contributor Author

conorsch commented May 6, 2024

Can you elaborate on this a little? Isn't our expectation that clients should work across upgrade boundaries?

If you try to run a client from current main against a public testnet endpoint, you'll see in incompatibility message related to ongoing auction work:

❯ git rev-parse HEAD
7854a5fc561e2e3f514421c3ea97c80cea5a673e

❯ cargo run -q --release --bin pcli -- view sync
Error: proto response missing auction params

Those same dependencies carry over into the integration tests.

@conorsch
Copy link
Contributor Author

conorsch commented May 6, 2024

Pushed a draft PR with a spike on local testing of migration logic, that can be promoted to a CI job once it's solid. Got surprised by a proto incompat error that may be spurious, so I'm going to run through the upgrade process manually to sanity-check that the scripting order is sound.

@hdevalence
Copy link
Member

@conorsch #4339 will simplify things a great deal

@hdevalence
Copy link
Member

Can you elaborate on this a little? Isn't our expectation that clients should work across upgrade boundaries?

If you try to run a client from current main against a public testnet endpoint, you'll see in incompatibility message related to ongoing auction work [...]

Those same dependencies carry over into the integration tests.

Got it. I was assuming we would run the smoke test script from the original tag and then, post-upgrade, run the smoke test script from the new HEAD.

@aubrika aubrika added _P-V1 Priority: slated for V1 release _P-high High priority and removed needs-refinement unclear, incomplete, or stub issue that needs work labels May 8, 2024
@aubrika aubrika modified the milestones: Sprint 6, Sprint 7 May 20, 2024
@conorsch
Copy link
Contributor Author

Made substantial progress on this front. There are notably two types of testing going on:

  1. closed-world, intra-CI testing of a single-validator devnet;
  2. open-world, publicly-accessible, cluster-hosted migration testing

The former is potentially suitable for per-PR runs, although so far the runtime is quite long: ~20m or so. We recently shaved a ton of per-PR CI runtime off with #4324, so it'd be a shame to knock it back up again, but for assurance it'd be worth it. This type of testing is great for catching problems like #4430.

The latter case is more intensive, and isn't yet end-to-end automated yet. Given that its setup reuses the same architecture as the public testnet, it's able to catch more subtle bugs, like #4443. For now, I'll continue to use this setup as part of pre-release QA.

conorsch added a commit that referenced this issue May 29, 2024
Adds intra-CI standalone testing of migration behavior.
This can be run locally on developer workstations,
and also in CI. The job is currently taking >20m, and requires
a lot of disk space, but it's worth it for the assurance.

Building on the smoke-test rewrite to use process-compose,
let's script the migration process, so that we can test
current HEAD of the monorepo against a prior tagged version,
and validate that necessary migrations are in place.

One possible approach is to fetch prebuilt binaries from uploaded
artifacts on Github. That's fine for `pd`, but doesn't work for
running the smoke tests, due to client/server incompatibility.
Therefore we'll clone the entire repo in a git-ignored subdir,
and build the old binaries there. Heavy, but reliable.

Adds a new rust crate, strictly for running the migration-test
suite of integration tests, which is very similar in nature to the
already-existing network-integration tests AKA smoke tests.
Copy/pastes a lot of code from the smokes, which we can always
factor out into reusable utils, but not bothering with that right now.

Refs #4323.
conorsch added a commit that referenced this issue May 30, 2024
Adds intra-CI standalone testing of migration behavior.
This can be run locally on developer workstations,
and also in CI. The job is currently taking >20m, and requires
a lot of disk space, but it's worth it for the assurance.

Building on the smoke-test rewrite to use process-compose,
let's script the migration process, so that we can test
current HEAD of the monorepo against a prior tagged version,
and validate that necessary migrations are in place.

One possible approach is to fetch prebuilt binaries from uploaded
artifacts on Github. That's fine for `pd`, but doesn't work for
running the smoke tests, due to client/server incompatibility.
Therefore we'll clone the entire repo in a git-ignored subdir,
and build the old binaries there. Heavy, but reliable.

Adds a new rust crate, strictly for running the migration-test
suite of integration tests, which is very similar in nature to the
already-existing network-integration tests AKA smoke tests.
Copy/pastes a lot of code from the smokes, which we can always
factor out into reusable utils, but not bothering with that right now.

Refs #4323.
conorsch added a commit that referenced this issue May 30, 2024
Adds intra-CI standalone testing of migration behavior.
This can be run locally on developer workstations,
and also in CI. The job is currently taking >20m, and requires
a lot of disk space, but it's worth it for the assurance.

Building on the smoke-test rewrite to use process-compose,
let's script the migration process, so that we can test
current HEAD of the monorepo against a prior tagged version,
and validate that necessary migrations are in place.

One possible approach is to fetch prebuilt binaries from uploaded
artifacts on Github. That's fine for `pd`, but doesn't work for
running the smoke tests, due to client/server incompatibility.
Therefore we'll clone the entire repo in a git-ignored subdir,
and build the old binaries there. Heavy, but reliable.

Adds a new rust crate, strictly for running the migration-test
suite of integration tests, which is very similar in nature to the
already-existing network-integration tests AKA smoke tests.
Copy/pastes a lot of code from the smokes, which we can always
factor out into reusable utils, but not bothering with that right now.

Refs #4323.
@aubrika aubrika modified the milestones: Sprint 7, Sprint 8 Jun 3, 2024
@aubrika aubrika modified the milestones: Sprint 8, Sprint 9 Jul 15, 2024
@aubrika aubrika modified the milestones: Sprint 9, Sprint 10 Aug 12, 2024
@aubrika aubrika removed this from the Sprint 10 milestone Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-CI/CD Relates to continuous integration & deployment of Penumbra A-upgrades Area: Relates to chain upgrades _P-high High priority _P-V1 Priority: slated for V1 release
Projects
Status: Backlog
Development

No branches or pull requests

3 participants