bug(l1): peer connection broken due to failure to add transaction to mempool before sync #1686

fmoletta · 2025-01-10T18:44:36Z

When we have just launched our node and try to connect to potential syncing peers we will also get transactions to add to our mempool. As our state is not synced we will often fail to validate and add these transactions to our mempool due to insufficient account funds. When this fails the error is propagated to the peer connection loop, ending it.

**Motivation** We are currently disconnecting from peers for trying to process their transactions while not yet synced. **Description** This PR provides a hotfix that discards any incoming transaction from P2P until we are synced. - Adds a `synced` flag to our storage. - Sets the flag to `false` when starting the node and every time the sync is triggered. - Sets the flag to `true` on every successful forkchoice update. - Checks the flag before handling `Transactions`, `NewPooledTransactionHashes` and `PooledTransactions` messages. - Adds `eth_syncing` hive test to the CI. - Logs `MempoolErrors` instead of cutting the connection. Closes #1686 Closes #179 Closes #337

**Motivation** This PR introduces the following upgrades for snap-sync: - Use DB-persisted checkpoints so we can persist the sync progress throughout restarts & cycles - Stop ForckChoices & NewPayloads being applied while syncing - Improved handling of stale pivot during sub-processes - Improved handling of pending requests when aborting due to stale pivot - Fetching of large storage tries (that don't fit in a single range request) - Safer (but a bit slower) healing that can be restarted - Faster storage fetching (multiple parallel fetches) And also simplifies it by removing the following logic: - No longer downloads bodies and receipts for blocks before the pivot during snap sync (WARNING: this goes against the spec but shouldn't be a problem for the time being) - Removes restart from latest block when latest - 64 becomes stale. (By this point it is more effective to wait for the next fork choice update) - Periodically shows state sync progress  **Description** - Stores the last downloaded block's hash in the DB during snap sync to serve as a checkpoint if the sync is aborted halfway (common case when syncing from genesis). This checkpoint is cleared upon succesful snap sync. - No longer fetches receipts or block bodies past the pivot block during snap sync - Add method `sync_status` which returns an enum with the current sync status (either Inactive, Active or Pending) and uses it in the ForkChoiceUpdate & NewPayload engine rpc endpoints so that we don't apply their logic during an active or pending sync. - Fetcher process now identify stale pivots and remain passive until they receive the end signal - Fetcher processes now return their current queue upon return so that it can be persisted into the next cycle - Stores the latest state root during state sync and healing as a checkpoint - Stores the last fetched key during state sync as a checkpoint - Healing no longer stores the nodes received via p2p, it instead inserts the leaf values and rebuilds it to avoid trie corruption between restarts. - The current progress percentage and estimated time to finish is periodically reported during state sync - Disables the following Paris & Cancun engine hive tests that previously yielded false positives due to new payloads being accepted on top of a syncing chain: * Invalid NewPayload (family) * Re-Org Back to Canonical Chain From Syncing Chain * Unknown HeadBlockHash * In-Order Consecutive Payload Execution (Flaky) * Valid NewPayload->ForkchoiceUpdated on Syncing Client * Invalid Missing Ancestor ReOrg * * Payload Build after New Invalid Payload  (only Cancun) - And also disables the following tests that fail with the flag Syncing=true for the same reason : * Bad Hash on NewPayload * ParentHash equals BlockHash on NewPayload (only for Paris) * Invalid PayloadAttributes (family) Misc: - Replaces some noisy unwraps in networking module with errors - Applies annotated hacky fixes for problems reported in #1684 #1685 & #1686   Closes None   Closes #issue_number

fmoletta added bug Something isn't working L1 labels Jan 10, 2025

github-project-automation bot added this to ethrex_l1 Jan 10, 2025

mpaulucci added the network Issues related to network communication label Jan 10, 2025

mpaulucci added this to the [L1] 4 - P2P Network milestone Jan 10, 2025

avilagaston9 self-assigned this Jan 17, 2025

avilagaston9 mentioned this issue Jan 20, 2025

fix(l1): ignore transactions from peers before sync #1754

Merged

This was referenced Jan 21, 2025

feat(l1): snap sync overhaul #1760

Closed

feat(l1): snap sync overhaul #1763

Merged

mpaulucci removed the bug Something isn't working label Jan 21, 2025

avilagaston9 closed this as completed in #1754 Jan 22, 2025

github-project-automation bot moved this to Done in ethrex_l1 Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug(l1): peer connection broken due to failure to add transaction to mempool before sync #1686

bug(l1): peer connection broken due to failure to add transaction to mempool before sync #1686

fmoletta commented Jan 10, 2025

bug(l1): peer connection broken due to failure to add transaction to mempool before sync #1686

bug(l1): peer connection broken due to failure to add transaction to mempool before sync #1686

Comments

fmoletta commented Jan 10, 2025