Skip to content

Commit

Permalink
test(resharding): enable resharding state sync and modify the reshard…
Browse files Browse the repository at this point in the history
…ing state sanity checks accordingly (near#12546)

Here we enable state sync after a resharding by removing the debug
assert that prevented it, and we unignore the corresponding resharding
TestLoop test.

To make this work, we have to modify
`assert_state_sanity_for_children_shard()`, because it currently assumes
that the client tracks all shards. If we make the simple change of just
checking whether the shard is tracked before proceeding, we'll run into
a problem because memtries are unloaded at the end of an epoch, and this
check was performed at the end of the test when the head of the chain is
in a different epoch than the final head, so clients tracking a single
shard won't have those memtries loaded anymore. So instead of making the
check at the end, we can make it on every new block. The check is pretty
quick anyway (<2 ms) and there's no harm in getting a bit more coverage.
While we're at it, we also make the check for every shard rather than
just the child shards, since it's a bit simpler and there's no downside
to a bit more coverage.

Performing the check on every block introduces a bit of extra work that
needs to be done to make sure we're actually checking what we think we
are, though. When checking flat storage and memtrie equality on
arbitrary blocks, it might be the case that flat storage isn't ready
because it's for a child shard that's still being created. This is not a
big deal, because we just skip checking flat storage and memtrie
equality in that case (but still check memtrie and trie equality). We
could just leave it at that, but there's a risk that the test will
silently not check what we think it was supposed to if there's a bug
that prevents the flat storage comparison from happening for an entire
epoch. So, we introduce a `struct TrieSanityCheck` to keep track of the
shards for which we expect there to be flat storage and memtries (for
each account for each epoch). Then at the end of the test we iterate
over epoch IDs and make sure all expected checks were performed.
  • Loading branch information
marcelo-gonzalez authored Dec 3, 2024
1 parent 79db60a commit 4b309ed
Show file tree
Hide file tree
Showing 4 changed files with 215 additions and 72 deletions.
5 changes: 5 additions & 0 deletions chain/chain-primitives/src/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,8 @@ pub enum Error {
InvalidShardId(ShardId),
#[error("Shard index {0} does not exist")]
InvalidShardIndex(ShardIndex),
#[error("Shard id {0} does not have a parent")]
NoParentShardId(ShardId),
/// Invalid shard id
#[error("Invalid state request: {0}")]
InvalidStateRequest(String),
Expand Down Expand Up @@ -326,6 +328,7 @@ impl Error {
| Error::InvalidBandwidthRequests(_)
| Error::InvalidShardId(_)
| Error::InvalidShardIndex(_)
| Error::NoParentShardId(_)
| Error::InvalidStateRequest(_)
| Error::InvalidRandomnessBeaconOutput
| Error::InvalidBlockMerkleRoot
Expand Down Expand Up @@ -407,6 +410,7 @@ impl Error {
Error::InvalidBandwidthRequests(_) => "invalid_bandwidth_requests",
Error::InvalidShardId(_) => "invalid_shard_id",
Error::InvalidShardIndex(_) => "invalid_shard_index",
Error::NoParentShardId(_) => "no_parent_shard_id",
Error::InvalidStateRequest(_) => "invalid_state_request",
Error::InvalidRandomnessBeaconOutput => "invalid_randomness_beacon_output",
Error::InvalidBlockMerkleRoot => "invalid_block_merkele_root",
Expand Down Expand Up @@ -450,6 +454,7 @@ impl From<ShardLayoutError> for Error {
ShardLayoutError::InvalidShardIndexError { shard_index } => {
Error::InvalidShardIndex(shard_index)
}
ShardLayoutError::NoParentError { shard_id } => Error::NoParentShardId(shard_id),
}
}
}
Expand Down
21 changes: 0 additions & 21 deletions chain/chain/src/chain.rs
Original file line number Diff line number Diff line change
Expand Up @@ -798,27 +798,6 @@ impl Chain {
me,
&prev_hash,
)?;
let prev_block = self.get_block(&prev_hash)?;

if prev_block.chunks().len() != epoch_first_block.chunks().len()
&& !shards_to_state_sync.is_empty()
{
// Currently, the state sync algorithm assumes that the number of chunks do not change
// between the epoch being synced to and the last epoch.
// For example, if shard layout changes at the beginning of epoch T, validators
// will not be able to sync states at epoch T for epoch T+1
// Fortunately, since all validators track all shards for now, this error will not be
// triggered in live yet
// Instead of propagating the error, we simply log the error here because the error
// do not affect processing blocks for this epoch. However, when the next epoch comes,
// the validator will not have the states ready so it will halt.
error!(
"Cannot download states for epoch {:?} because sharding just changed. I'm {:?}",
epoch_first_block.header().epoch_id(),
me
);
debug_assert!(false);
}
if shards_to_state_sync.is_empty() {
Ok(None)
} else {
Expand Down
7 changes: 4 additions & 3 deletions core/primitives/src/shard_layout.rs
Original file line number Diff line number Diff line change
Expand Up @@ -327,6 +327,7 @@ impl ShardLayoutV2 {
pub enum ShardLayoutError {
InvalidShardIdError { shard_id: ShardId },
InvalidShardIndexError { shard_index: ShardIndex },
NoParentError { shard_id: ShardId },
}

impl fmt::Display for ShardLayoutError {
Expand Down Expand Up @@ -624,15 +625,15 @@ impl ShardLayout {
return Err(ShardLayoutError::InvalidShardIdError { shard_id });
}
let parent_shard_id = match self {
Self::V0(_) => panic!("shard layout has no parent shard"),
Self::V0(_) => return Err(ShardLayoutError::NoParentError { shard_id }),
Self::V1(v1) => match &v1.to_parent_shard_map {
// we can safely unwrap here because the construction of to_parent_shard_map guarantees
// that every shard has a parent shard
Some(to_parent_shard_map) => {
let shard_index = self.get_shard_index(shard_id)?;
*to_parent_shard_map.get(shard_index).unwrap()
}
None => panic!("shard_layout has no parent shard"),
None => return Err(ShardLayoutError::NoParentError { shard_id }),
},
Self::V2(v2) => match &v2.shards_parent_map {
Some(to_parent_shard_map) => {
Expand All @@ -641,7 +642,7 @@ impl ShardLayout {
.ok_or(ShardLayoutError::InvalidShardIdError { shard_id })?;
*parent_shard_id
}
None => panic!("shard_layout has no parent shard"),
None => return Err(ShardLayoutError::NoParentError { shard_id }),
},
};
Ok(parent_shard_id)
Expand Down
Loading

0 comments on commit 4b309ed

Please sign in to comment.