From 0857438181948b335892894d127a5eeca8318edf Mon Sep 17 00:00:00 2001 From: posvyatokum Date: Mon, 11 Mar 2024 21:47:38 +0000 Subject: [PATCH] add draft instructions on a state snapshot recovery --- docs/troubleshooting/attach_state_snapshot.md | 50 +++++++++++++++++++ docs/troubleshooting/resharding.md | 10 ++++ 2 files changed, 60 insertions(+) create mode 100644 docs/troubleshooting/attach_state_snapshot.md diff --git a/docs/troubleshooting/attach_state_snapshot.md b/docs/troubleshooting/attach_state_snapshot.md new file mode 100644 index 0000000..37893bb --- /dev/null +++ b/docs/troubleshooting/attach_state_snapshot.md @@ -0,0 +1,50 @@ +--- +id: attach-state-snapshot +title: Attach State Snapshot +sidebar_label: Attach State Snpashot +description: Instructions on attaching a supporting state snapshot. +--- + +## Terminology {#terminology} +State Snapshot is different from DB snapshot. +State Snapshot is checkpoint of some columns of the full DB taken at the epoch boundary. +It is used in state sync and resharding. + +State snapshots are identified by the last block hash of the epoch. +We save state snapshot in `$NEAR_HOME_DATA/state_snapshot/$BLOCK_HASH`. +We also save `$BLOCK_HASH` in DB to know which path to open when we need to use snapshot. + +## How to attach state snapshot to existing node {#how to attach} +1. Download state snapshot on your machine. +You can download it to any directory, but `$NEAR_HOME_DATA/state_snapshot/$BLOCK_HASH` has to point to your new state snapshot. +2. Create a support directory anywhere on the node. We will refer to it as `$OTHER_HOME`. +3. Copy config to the new directory +```bash +cp $NEAR_HOME/config.json $OTHER_HOME/config.json +``` +4. Point data directory of `$OTHER_HOME` to state snapshot. +```bash +ln -s $OTHER_HOME/test-data +``` +5. Change `$OTHER_HOME` config to work with state snapshot +```bash +cat <<< $(jq '.archive = false | .cold_store = null | .store.path = "test-data"' $OTHER_HOME/config.json) > $OTHER_HOME/config.json +``` +6. Change state snapshot `DBKind` to suit your node. +If you are running a split storage archival node run +```bash +$NEARD --unsafe-fast-startup --home $OTHER_HOME database change-db-kind --new-kind Hot change-hot +``` +If you are running rpc node run +```bash +$NEARD --unsafe-fast-startup --home $OTHER_HOME database change-db-kind --new-kind RPC change-hot +``` +7. You can delete `$OTHER_HOME` now. +8. If you are fixing a problem for 1.37 or 1.38 release you need to build a binary from [this tool branch](https://github.com/near/nearcore/tree/1.37.0-fix). +Changes from this branch will be included in 1.39 release by default. +9. Stop your node +10. Run a binary with [tool branch](https://github.com/near/nearcore/tree/1.37.0-fix) changes to save `$BLOCK_HASH` in RocksDB. +```bash +$NEARD_TOOL --unsafe-fast-startup --home $NEAR_HOME database write-crypto-hash --hash $BLOCK_HASH +``` +11. Restart your node \ No newline at end of file diff --git a/docs/troubleshooting/resharding.md b/docs/troubleshooting/resharding.md index 7a5001a..07e14ad 100644 --- a/docs/troubleshooting/resharding.md +++ b/docs/troubleshooting/resharding.md @@ -60,6 +60,16 @@ If you observe problems with block production or resharding performance, you can This does not require a node restart, you can send a signal to the neard process to load the new config. Read more [on github](https://github.com/near/nearcore/blob/master/docs/architecture/how/resharding.md#monitoring). +### Mitigating state snapshot issue {#state snapshot} +Node has to have a state snapshot in order for resharding to run. +State snapshot is a smaller checkpoint of the whole DB taken at the epoch boundary. +If you see any errors around creating or opening state snapshot, you may download state snapshot and attach it to your node. +Look for `ERROR state_snapshot` log lines around the epoch switch times. +For 1.37 the epoch switch happened around `2024-03-11 19:28:30`. + +Further instructions are in [Attaching State Snapshot page](/troubleshooting/attach-state-snapshot). +For 1.37 release resharding use block hash `EqT4A5h9ayaALpJZNX4SK3dG3HDPWUH9QDuhfCcWSXHi`. + ### After resharding {#after 1.37} If your node failed to reshard or is not able to sync with the network after the protocol upgrade, you will need to download the latest DB snapshot provided by Pagoda from s3 [Node Data Snapshots](/intro/node-data-snapshots).