Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: ADR for High Throughput Recovery #4315

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions docs/architecture/adr-024-high-throughput-recovery.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# ADR 024: High Throughput Recovery

## Changelog

- 2025/01/29: Initial draft (@evan-forbes)

## Status

Proposed

## Context

The Celestia protocol will likely separate block propagation into two phases. "Preparation", for distributing data before the block is created, and "recovery" for distributing data after the block has been created. In order to utilize the data distributed before the block is created, the recovery phase must also be pull based. Therefore, the constraints for recovery are:

- 100% of the Block data MUST be delivered to >2/3 of the voting power before the ProposalTimeout is reached
- MUST use pull based gossip

## Decision

TBD

## Detailed Design

- [Messages](./assets/adr024/messages.md)
- [Handlers and State](./assets/adr024/handlers_and_state.md)
- [Connecting to Consensus](./assets/adr024/connecting_to_consensus.md)

## Alternative Approaches

### PBBT w/o erasure encoding

### No broadcast tree

## Consequences

### Positive

### Negative

### Neutral

## References
119 changes: 119 additions & 0 deletions docs/architecture/assets/adr024/connecting_to_consensus.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Backwards Compatible Block Propagation

This document is an extension of ADR024.

## Intro

Changes to gossiping protocols need to be backwards compatible with the existing
mechanism to allow for seemless upgrades. This means that the gossiping
mechanisms need to be hotswapple. This can be challenging due to the consensus
reactor and state having their own propagation mechanism, and that they were not
designed to be easily modifiable.

## Compatability with the Consensus Reactor

Minimally invasive modularity can be added by not touching the consensus state,
and utilizing the same entry points that exist now. That is, the consenus
reactors internal message channel to the consensus state. While far from optimal
from an engineering or even performance perspective, by simply adding (yet another)
syncing routine, we can sync the data from the block propagation reactor to the
consensus.

```go
// sync data periodically checks to make sure that all block parts in the data
// routine are pushed through to the state.
func (cs *State) syncData() {
for {
select {
case <-cs.Quit():
return
case <-time.After(time.Millisecond * SyncDataInterval):
if cs.dr == nil {
continue
}

cs.mtx.RLock()
h, r := cs.Height, cs.Round
pparts := cs.ProposalBlockParts
pprop := cs.Proposal
completeProp := cs.isProposalComplete()
cs.mtx.RUnlock()

if completeProp {
continue
}

prop, parts, _, has := cs.dr.GetProposal(h, r)

if !has {
continue
}

if prop != nil && pprop == nil {
cs.peerMsgQueue <- msgInfo{&ProposalMessage{prop}, ""}
}

if pparts != nil && pparts.IsComplete() {
continue
}

for i := 0; i < int(parts.Total()); i++ {
if pparts != nil {
if p := pparts.GetPart(i); p != nil {
continue
}
}

part := parts.GetPart(i)
if part == nil {
continue
}
cs.peerMsgQueue <- msgInfo{&BlockPartMessage{cs.Height, cs.Round, part}, ""}
}
}
}
}
```

This allows for the old routine, alongside the rest of the consensus state
logic, to function as it used to for peers that have yet to migrate to newer
versions. If the peer does not indicate that they are using the new block prop
reactor during the handshake, then the old gossiping routines are spun up like
normal upon adding the peer to the consensus reactor. However, if the peer has
indicated that they are using the new consensus reactor, then the old routines
are simply not spun up. Something along the lines of the below code should
suffice.

```go
func legacyPropagation(peer p2p.Peer) (bool, error) {
legacyblockProp := true
ni, ok := peer.NodeInfo().(p2p.DefaultNodeInfo)
if !ok {
return false, errors.New("wrong NodeInfo type. Expected DefaultNodeInfo")
}

for _, ch := range ni.Channels {
if ch == types.BlockPropagationChannel {
legacyblockProp = false
break
}
}

return legacyblockProp, nil
}
```

## Compatability with Parity Data

Adding parity data is highly advantageous for broadcast trees and pull based
gossip. However, the added parity data also requires being committed to by the
proposer. At the moment, the proposer commits over the block data via the
`PartSetHeader`. In order to be backwards compatible, we can't break this.
Simulataneously, we don't want to add excessive overhead via requiring
commitments computed twice. In order to solve this dilemma, we can simply reuse
the first commitment, add a second parity commitment computed identically to the
original `PartSetHeader` hash.

Setting the `PartSetHeader` hash to the zero value and not using it is an
option. Since this is a consensus breaking change, changing the commitment in
the `CompactBlock` can be done at the same time.
3 changes: 3 additions & 0 deletions docs/architecture/assets/adr024/handlers_and_state.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Logic and State

The pbbt reactor logic at a high level is described in the spec.
146 changes: 146 additions & 0 deletions docs/architecture/assets/adr024/messages.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
# PBBT Messages and Validation Logic

At a high level, all flavors of PBBT have four message types. `Commitment`,
`Have`, `Want`, and `Data`.

## Commitment

```proto
message TxMetaData {
bytes hash = 1;
uint32 start = 2;
uint32 end = 3;
}

// CompactBlock commits to the transaction included in a proposal.
message CompactBlock {
int64 height = 1;
int32 round = 2;
bytes bp_hash = 3;
repeated TxMetaData blobs = 4;
bytes signature = 5;
}
```

The compact block is signed over by the proposer, and verified by converting to
signbytes, and using the proposer's public key to verify the included signature.

> Note: This siganture is separate from the proposal signature as it is purely
> related to block propagation, and not meant to be part of the proposal. This
> allows for block propagation to be backwards compatible with older
> implementations.

The `TxMetaData` contains the hash of the PFB for the blob transaction that it
commits to, alongside the `start` and `end`. `start` is the inclusive index of
the starting byte in the protobuf encoded block. `end` depicts the last byte
occupied by the blob transaction.

The `pbbt_root` is generated by taking the merkle root over of each of the blob
transactions in `BlobMetaData` and `Have` messasges.

Verification:

- The signature MUST be valid using the sign bytes of the compact block and the public key of the expected proposer for that height and
round.

## Have

```protobuf=
message HaveParts {
bytes hash = 1;
int64 height = 2;
int32 round = 3;
tendermint.crypto.Proof proof = 4 [(gogoproto.nullable) = false];
}
```

Verification:

- The merkle proof MUST be verified using the roots included in the
`CompactBlock` for that height and round. If the data is parity data, then it
MUST use the `parity_root`, if the data is original block data, then it MUST
use the `PartSetHeaderRoot`.

### Want

```protobuf
message WantParts {
tendermint.libs.bits.BitArray parts = 1 [(gogoproto.nullable) = false];
int64 height = 2;
int32 round = 3;
}
```

## Data

```protobuf
message RecoveryPart {
int64 height = 1;
int32 round = 2;
uint32 index = 3;
bytes data = 4;
}
```

Verification

- The hash of the bytes in the data field MUST match that of the `Have` message.

### Parity Data

Parity data is required for all practical broadcast trees. This becomes
problematic mainly due to the requirement that transactions downloaded before
the block is created need to be used during recovery. Using erasure encoding
means that the data must be chunked in an even size. All transactions in that
chunk must have been downloaded in order to use it alongside parity data to
reconstruct the block. Most scenarios would likely be fine, however it would be
possible for a node to have downloaded a large portion of the block, but have no
complete parts, rendering all of the parity data useless. The way to fix this
while remaining backwards compatible is to still commit over and propagate
parts, but to erasure encode smaller chunks of those parts, aka `SubParts`.

```go
const (
SubPartsPerPart uint32 = 32
SubPartSize = BlockPartSizeBytes / SubPartsPerPart
)

type Part struct {
Index uint32 `json:"index"`
Bytes cmtbytes.HexBytes `json:"bytes"`
Proof merkle.Proof `json:"proof"`
}

// SubPart is a portion of a part and block that is used for generating parity
// data.
type SubPart struct {
Index uint32 `json:"index"`
Bytes cmtbytes.HexBytes `json:"bytes"`
}

// SubPart breaks a block part into smaller equal sized subparts.
func (p *Part) SubParts() []SubPart {
sps := make([]SubPart, SubPartsPerPart)
for i := uint32(0); i < SubPartsPerPart; i++ {
sps[i] = SubPart{
Index: uint32(i),
Bytes: p.Bytes[i*SubPartSize : (i+1)*SubPartSize],
}
}
return sps
}

func PartFromSubParts(index uint32, sps []SubPart) *Part {
if len(sps) != int(SubPartsPerPart) {
panic(fmt.Sprintf("invalid number of subparts: %d", len(sps)))
}
b := make([]byte, 0, BlockPartSizeBytes)
for _, sp := range sps {
b = append(b, sp.Bytes...)
}
return &Part{
Index: index,
Bytes: b,
}
}
```
Loading