Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(pending): pending block sealed instead of re-added to mempool #468

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

Trantorian1
Copy link
Collaborator

@Trantorian1 Trantorian1 commented Jan 16, 2025

Important

Most of the changes in this pr are just me adding some tests. Actual logic changes are pretty small.

Pull Request type

  • Feature

What is the current behavior?

Pending block transactions are re-added back to the mempool and re-executed upon startup. This happens in case of graceful or ungraceful shutdown in the middle of block production: the pending block will be saved to db to avoid loss of state.

This leads to several issues:

  1. Transactions are re-executed, and might provided different outputs or hashes.
  2. Transactions from the pending block are treated like any other and could be dropped by the mempool under certain edge cases.

Resolves: #458

What is the new behavior?

Pending block, alongside the pending state diff, visited segments and declared classes are retrieved from db and closed into a new finalized block.

This means that if a sequencer node is shutdown mid-production, the state contained in the pending block will automatically be persisted upon startup as it own finalized block without re-executing transactions. This also guarantees that any changes to the config will not be applied to transactions executed under different parameters (these will only affect the next block).

Does this introduce a breaking change?

No.

@Trantorian1 Trantorian1 added feature Request for new feature or enhancement sequencer Related to the sequencing logic and implementation labels Jan 16, 2025
@Trantorian1 Trantorian1 self-assigned this Jan 16, 2025
@Trantorian1 Trantorian1 force-pushed the feat/avoid_pending_re_exec branch from 05fd24e to 74f48fc Compare January 17, 2025 14:54
@Trantorian1 Trantorian1 marked this pull request as ready for review January 17, 2025 15:07
Copy link
Contributor

@Mohiiit Mohiiit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! The test cases are really clean and well-structured. I have a couple of thoughts:

  • Should we consider adding E2E tests for this? For example, we could simulate spawning Madara, shutting it down mid-process, and then restarting it to verify that everything works as expected.
  • I found the stages a bit confusing. In most test cases, when we store the pending block in stage 2, it’s mentioned that this simulates stopping and restarting the node. However, isn’t that more related to the block production task? Maybe I’m misunderstanding, but this part could use some clarification.

@Trantorian1
Copy link
Collaborator Author

Trantorian1 commented Jan 21, 2025

  • I found the stages a bit confusing. In most test cases, when we store the pending block in stage 2, it’s mentioned that this simulates stopping and restarting the node. However, isn’t that more related to the block production task? Maybe I’m misunderstanding, but this part could use some clarification.

So, the block production functions by storing un-finalized blocks as pending. This is the only form of data we can recover without re-execution as everything else is stored in RAM (mempool transactions which have not yet been polled yet are also stored in db for retrieval, but these haven't been executed anyways). This means that if ever the node crashes, we will only be able to retrieve whatever data was stored in the pending block. This is done atomically so we never commit partial data to the database and only a full pending block can ever be stored.

We are therefore "simulate[ing] stopping and restarting the node", since:

  1. This is the only pending data that can persist a node restart, and it cannot be partially valid (we still test failing cases though).
  2. Upon restart, this is what the block production would be looking to seal.

I will be adding some if this in comments to the tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Request for new feature or enhancement sequencer Related to the sequencing logic and implementation
Projects
Status: In review
Development

Successfully merging this pull request may close these issues.

feat(blockprod): Revised Approach for Madara Block Production Shutdown
3 participants