-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add instructions on a state snapshot recovery #92
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
--- | ||
id: attach-state-snapshot | ||
title: Attach State Snapshot | ||
sidebar_label: Attach State Snpashot | ||
description: Instructions on attaching a supporting state snapshot. | ||
--- | ||
|
||
## Terminology {#terminology} | ||
State Snapshot is different from DB snapshot. | ||
State Snapshot is checkpoint of some columns of the full DB taken at the epoch boundary. | ||
It is used in state sync and resharding. | ||
|
||
State snapshots are identified by the last block hash of the epoch. | ||
We save state snapshot in `$NEAR_HOME_DATA/state_snapshot/$BLOCK_HASH`. | ||
We also save `$BLOCK_HASH` in DB to know which path to open when we need to use snapshot. | ||
|
||
## How to attach state snapshot to existing node {#how to attach} | ||
1. Download state snapshot on your machine. | ||
You can download it to any directory, but `$NEAR_HOME_DATA/state_snapshot/$BLOCK_HASH` has to point to your new state snapshot. | ||
2. Create a support directory anywhere on the node. We will refer to it as `$OTHER_HOME`. | ||
3. Copy config to the new directory | ||
```bash | ||
cp $NEAR_HOME/config.json $OTHER_HOME/config.json | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also copied genesis and node key. I'm not sure if both were required but at least one of them was. |
||
``` | ||
4. Point data directory of `$OTHER_HOME` to state snapshot. | ||
```bash | ||
ln -s <state snapshot path> $OTHER_HOME/test-data | ||
``` | ||
5. Change `$OTHER_HOME` config to work with state snapshot | ||
```bash | ||
cat <<< $(jq '.archive = false | .cold_store = null | .store.path = "test-data"' $OTHER_HOME/config.json) > $OTHER_HOME/config.json | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll trust your bash scripting :) |
||
``` | ||
6. Change state snapshot `DBKind` to suit your node. | ||
If you are running a split storage archival node run | ||
```bash | ||
$NEARD --unsafe-fast-startup --home $OTHER_HOME database change-db-kind --new-kind Hot change-hot | ||
``` | ||
If you are running rpc node run | ||
```bash | ||
$NEARD --unsafe-fast-startup --home $OTHER_HOME database change-db-kind --new-kind RPC change-hot | ||
``` | ||
7. You can delete `$OTHER_HOME` now. | ||
8. If you are fixing a problem for 1.37 or 1.38 release you need to build a binary from [this tool branch](https://github.com/near/nearcore/tree/1.37.0-fix). | ||
Changes from this branch will be included in 1.39 release by default. | ||
9. Stop your node | ||
10. Run a binary with [tool branch](https://github.com/near/nearcore/tree/1.37.0-fix) changes to save `$BLOCK_HASH` in RocksDB. | ||
```bash | ||
$NEARD_TOOL --unsafe-fast-startup --home $NEAR_HOME database write-crypto-hash --hash $BLOCK_HASH | ||
``` | ||
11. Restart your node |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -60,6 +60,16 @@ If you observe problems with block production or resharding performance, you can | |
This does not require a node restart, you can send a signal to the neard process to load the new config. | ||
Read more [on github](https://github.com/near/nearcore/blob/master/docs/architecture/how/resharding.md#monitoring). | ||
|
||
### Mitigating state snapshot issue {#state snapshot} | ||
Node has to have a state snapshot in order for resharding to run. | ||
State snapshot is a smaller checkpoint of the whole DB taken at the epoch boundary. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would also add something like: |
||
If you see any errors around creating or opening state snapshot, you may download state snapshot and attach it to your node. | ||
Look for `ERROR state_snapshot` log lines around the epoch switch times. | ||
For 1.37 the epoch switch happened around `2024-03-11 19:28:30`. | ||
|
||
Further instructions are in [Attaching State Snapshot page](/troubleshooting/attach-state-snapshot). | ||
For 1.37 release resharding use block hash `EqT4A5h9ayaALpJZNX4SK3dG3HDPWUH9QDuhfCcWSXHi`. | ||
|
||
### After resharding {#after 1.37} | ||
If your node failed to reshard or is not able to sync with the network after the protocol upgrade, you will need to download the latest DB snapshot provided by Pagoda from s3 | ||
[Node Data Snapshots](/intro/node-data-snapshots). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically it's a checkpoint of the whole db with some unneeded columns deleted. At the end of the day we have hardlinks to all of the sst files (at least without compaction). I would keep it simple and just say it's a checkpoing of the full db, and not mention about being selective of some columns. This is in line with the expected size of the snapshot too.