Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrades: failing to release lock on storage #4430

Closed
conorsch opened this issue May 21, 2024 · 5 comments · Fixed by #4431
Closed

upgrades: failing to release lock on storage #4430

conorsch opened this issue May 21, 2024 · 5 comments · Fixed by #4431
Labels
A-upgrades Area: Relates to chain upgrades _P-high High priority

Comments

@conorsch
Copy link
Contributor

On the latest round of migration-testing for Testnet 76 (#4402), I've started seeing:

2024-05-21T22:13:45.041770Z  INFO pd_migrate:migrate{self=Testnet76 comet_home=Some("/penumbra-config/penumbra-devnet-val/node0/cometbft")}:migrate{storage=Storage { .. } pd_home="/penumbra-config/penumbra-devnet-val/node0/pd" genesis_start=Some(Time(2024-05-21 22:05:35.659383898))}: cnidarium
::storage: opening rocksdb config column path="/penumbra-config/penumbra-devnet-val/node0/pd/rocksdb"
thread 'tokio-runtime-worker' panicked at /usr/src/penumbra/crates/cnidarium/src/storage.rs:73:60:
can open database: Error { message: "IO error: lock hold by current process, acquire time 1716329625 acquiring thread 29: /penumbra-config/penumbra-devnet-val/node0/pd/rocksdb/LOCK: No locks available" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Error: failed to upgrade state

Caused by:
    task 19 panicked
command terminated with exit code 1

This looks like #4344, so it's likely a recent merge forgot to drop a storage handle.

@github-project-automation github-project-automation bot moved this to Backlog in Penumbra May 21, 2024
@github-actions github-actions bot added the needs-refinement unclear, incomplete, or stub issue that needs work label May 21, 2024
@conorsch conorsch moved this from Backlog to In progress in Penumbra May 21, 2024
@conorsch
Copy link
Contributor Author

Found one missing:

diff --git a/crates/bin/pd/src/migrate/testnet76.rs b/crates/bin/pd/src/migrate/testnet76.rs
index b02b2b559..5c8e6ea15 100644
--- a/crates/bin/pd/src/migrate/testnet76.rs
+++ b/crates/bin/pd/src/migrate/testnet76.rs
@@ -121,6 +121,7 @@ pub async fn migrate(
     let rocksdb_dir = pd_home.join("rocksdb");
     let storage = Storage::load(rocksdb_dir, SUBSTORE_PREFIXES.to_vec()).await?;
     let migrated_state = storage.latest_snapshot();
+    storage.release().await;

     // The migration is complete, now we need to generate a genesis file. To do this, we need
     // to lookup a validator view from the chain, and specify the post-upgrade app hash and

Checking for more...

@conorsch
Copy link
Contributor Author

diff --git a/crates/bin/pd/src/migrate/reset_halt_bit.rs b/crates/bin/pd/src/migrate/reset_halt_bit.rs
index b9877378c..4e0060592 100644
--- a/crates/bin/pd/src/migrate/reset_halt_bit.rs
+++ b/crates/bin/pd/src/migrate/reset_halt_bit.rs
@@ -14,6 +14,7 @@ pub async fn migrate(
     let mut delta = StateDelta::new(export_state);
     delta.ready_to_start();
     let _ = storage.commit_in_place(delta).await?;
+    storage.release().await;
     tracing::info!("migration completed: halt bit is turned off, chain is ready to start");
     Ok(())
 }

@erwanor
Copy link
Member

erwanor commented May 21, 2024

I had removed it, it must have slipped back in during a rebase of #4137. I will push a pr.

@conorsch
Copy link
Contributor Author

Those two aren't sufficient to resolve, so I'm still digging around. Happy to pair or review, I've got a slick testing setup to validate.

@erwanor
Copy link
Member

erwanor commented May 21, 2024

@conorsch I believe this should fix it: fb1b14e

@conorsch conorsch added _P-high High priority A-upgrades Area: Relates to chain upgrades and removed needs-refinement unclear, incomplete, or stub issue that needs work labels May 22, 2024
@github-project-automation github-project-automation bot moved this from In progress to Done in Penumbra May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-upgrades Area: Relates to chain upgrades _P-high High priority
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants