Skip to content

Commit

Permalink
[dip-213] decoupled execution
Browse files Browse the repository at this point in the history
  • Loading branch information
Zekun Li committed Nov 24, 2021
1 parent c793211 commit 77b6b27
Showing 1 changed file with 83 additions and 0 deletions.
83 changes: 83 additions & 0 deletions dips/dip-213.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
---
dip: 213
title: Decoupled execution
authors: Zekun Li (@zekun000), Yu Xia (@yuxiamit)
status: Draft
type: Standard
created: 08/09/2021
last updated: 09/09/2021
issue: https://github.com/diem/dip/issues/213
---

# Summary

The current consensus agrees on transaction ordering together with execution results. The good part is the simplicity that enables us to implement and harden within short time period.
The bad part is that the coupling between execution and ordering stall each other and limit the performance.
This DIP proposes to separate them to unlock better throughput without compromising security.

If the time of ordering, execution and commit for a block is O, E, C correspondingly, the throughput should change from 1/(O + E + C) to 1/min(O, E, C) which not only unlocks wins immediately
but make future bottlenecks more visible and could guide optimizations.

# Description

A block (proposal) goes through different steps before it's finalized. In DiemBFTv4, the steps are
```
Consensus stage
1. Proposed
2. Executed
3. Voted (including execution result)
4. QuorumCertified
5. 2-chain Certified (Committed)
```

The proposed change is to decouple execution from consensus and pipeline the process with different stages. A block would go through
4 stages each with its own steps:
```
1. Consensus stage
* Proposed
* Voted (no execution result)
* QuorumCertified
* 2-chain Certified (Ordered)
2. Execution stage
3. Voting stage (for execution result)
4. Commit stage
```
The stages run in parallel to achieve the best resource utilization, the system could commit B1, sign B2, execute B3, and order B4 in the same time.

# Required Changes

## Consensus
`StateComputer` is the trait that consensus uses for `compute`, `commit` or `sync_to` blocks.
A simple `ordering_state_computer` is implemented to bypass execution and send blocks to next stage when ordered.

## BufferManager
A `BufferManager` is implemented to manage different stages of ordered blocks, it's a queue of blocks with stages markers.
- `Ordered` block is sent to execution and advanced to `Executed` after receiving execution result.
- `Executed` block is sent to safety rules and advanced to `Signed` after receiving signature.
- Signature on `Signed` block is broadcasted to every validators until it's advanced to `Aggregated`.
- `Executed` and `Signed` block is advanced to `Aggregated` after receiving enough signatures.
- `Aggregated` block is popped from the queue and sent to persistent storage.

## Sync
When the node is far from current proposed block, it may decide to fast-forward via state sync protocol. It's triggered by the difference between
local ordered round (the highest round of ordered blocks) and remote committed round (the highest round of committed block). Upon state sync,
the node needs to fetch blocks chaining from committed round to ordered round + 2 (which carries the ordering certificate) instead of ordered round to ordered round + 2 before.
Block retrieval is improved to support chunked requests.

## Backpressure
To prevent consensus going too fast and create ever-growing backlog for other stages, back pressure is implemented to stop consensus making progress if the difference between committed block and ordered block is large.

## Reconfiguration
A reconfiguration transaction is the last transaction of an epoch. Execution recognizes this type of transaction and buffer manager would stop processing any blocks after the reconfiguration block.
Consensus would stop once the backpressure is triggered, blocks after the reconfiguration block would be discarded and transactions would be retried in next epoch.
After the reconfiguration block gets committed, new epoch would be instantiated and old epoch instance would bd dropped.

## Upgrade
Switching the protocol requires reconfiguration, on-chain consensus config is updated to newer version to support the upgrade.

## Client
The finality proof is aggregated in the same format as before (LedgerInfoWithSignatures), so this change is client agnostic.

# Future opportunities
The execution signature aggregation is implemented as simple broadcast, a leader based mechanism could be implemented to reduce the network cost
by compromising one-hop latency.

0 comments on commit 77b6b27

Please sign in to comment.