-
Notifications
You must be signed in to change notification settings - Fork 51
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Zekun Li
committed
Nov 24, 2021
1 parent
c793211
commit 77b6b27
Showing
1 changed file
with
83 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
--- | ||
dip: 213 | ||
title: Decoupled execution | ||
authors: Zekun Li (@zekun000), Yu Xia (@yuxiamit) | ||
status: Draft | ||
type: Standard | ||
created: 08/09/2021 | ||
last updated: 09/09/2021 | ||
issue: https://github.com/diem/dip/issues/213 | ||
--- | ||
|
||
# Summary | ||
|
||
The current consensus agrees on transaction ordering together with execution results. The good part is the simplicity that enables us to implement and harden within short time period. | ||
The bad part is that the coupling between execution and ordering stall each other and limit the performance. | ||
This DIP proposes to separate them to unlock better throughput without compromising security. | ||
|
||
If the time of ordering, execution and commit for a block is O, E, C correspondingly, the throughput should change from 1/(O + E + C) to 1/min(O, E, C) which not only unlocks wins immediately | ||
but make future bottlenecks more visible and could guide optimizations. | ||
|
||
# Description | ||
|
||
A block (proposal) goes through different steps before it's finalized. In DiemBFTv4, the steps are | ||
``` | ||
Consensus stage | ||
1. Proposed | ||
2. Executed | ||
3. Voted (including execution result) | ||
4. QuorumCertified | ||
5. 2-chain Certified (Committed) | ||
``` | ||
|
||
The proposed change is to decouple execution from consensus and pipeline the process with different stages. A block would go through | ||
4 stages each with its own steps: | ||
``` | ||
1. Consensus stage | ||
* Proposed | ||
* Voted (no execution result) | ||
* QuorumCertified | ||
* 2-chain Certified (Ordered) | ||
2. Execution stage | ||
3. Voting stage (for execution result) | ||
4. Commit stage | ||
``` | ||
The stages run in parallel to achieve the best resource utilization, the system could commit B1, sign B2, execute B3, and order B4 in the same time. | ||
|
||
# Required Changes | ||
|
||
## Consensus | ||
`StateComputer` is the trait that consensus uses for `compute`, `commit` or `sync_to` blocks. | ||
A simple `ordering_state_computer` is implemented to bypass execution and send blocks to next stage when ordered. | ||
|
||
## BufferManager | ||
A `BufferManager` is implemented to manage different stages of ordered blocks, it's a queue of blocks with stages markers. | ||
- `Ordered` block is sent to execution and advanced to `Executed` after receiving execution result. | ||
- `Executed` block is sent to safety rules and advanced to `Signed` after receiving signature. | ||
- Signature on `Signed` block is broadcasted to every validators until it's advanced to `Aggregated`. | ||
- `Executed` and `Signed` block is advanced to `Aggregated` after receiving enough signatures. | ||
- `Aggregated` block is popped from the queue and sent to persistent storage. | ||
|
||
## Sync | ||
When the node is far from current proposed block, it may decide to fast-forward via state sync protocol. It's triggered by the difference between | ||
local ordered round (the highest round of ordered blocks) and remote committed round (the highest round of committed block). Upon state sync, | ||
the node needs to fetch blocks chaining from committed round to ordered round + 2 (which carries the ordering certificate) instead of ordered round to ordered round + 2 before. | ||
Block retrieval is improved to support chunked requests. | ||
|
||
## Backpressure | ||
To prevent consensus going too fast and create ever-growing backlog for other stages, back pressure is implemented to stop consensus making progress if the difference between committed block and ordered block is large. | ||
|
||
## Reconfiguration | ||
A reconfiguration transaction is the last transaction of an epoch. Execution recognizes this type of transaction and buffer manager would stop processing any blocks after the reconfiguration block. | ||
Consensus would stop once the backpressure is triggered, blocks after the reconfiguration block would be discarded and transactions would be retried in next epoch. | ||
After the reconfiguration block gets committed, new epoch would be instantiated and old epoch instance would bd dropped. | ||
|
||
## Upgrade | ||
Switching the protocol requires reconfiguration, on-chain consensus config is updated to newer version to support the upgrade. | ||
|
||
## Client | ||
The finality proof is aggregated in the same format as before (LedgerInfoWithSignatures), so this change is client agnostic. | ||
|
||
# Future opportunities | ||
The execution signature aggregation is implemented as simple broadcast, a leader based mechanism could be implemented to reduce the network cost | ||
by compromising one-hop latency. |