Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] New Block Reconstruction Protocol Proposal #4051

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

walldiss
Copy link
Member

This PR introduces a draft proposal for a new block reconstruction protocol aimed at improving network efficiency and scalability.

Key improvements:

  • Reduces duplicate requests and bandwidth usage
  • Structured bitmap sharing using Roaring Bitmaps
  • Clear separation of client/server responsibilities
  • Robust failure handling with peer penalties
  • Optimized proof packaging and sharing

The document serves as a starting point for discussion and will be later split into CIP and ADR.

@walldiss walldiss self-assigned this Jan 20, 2025
@codecov-commenter
Copy link

codecov-commenter commented Jan 20, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 44.85%. Comparing base (2469e7a) to head (e506dc6).
Report is 433 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4051      +/-   ##
==========================================
+ Coverage   44.83%   44.85%   +0.02%     
==========================================
  Files         265      308      +43     
  Lines       14620    22488    +7868     
==========================================
+ Hits         6555    10088    +3533     
- Misses       7313    11316    +4003     
- Partials      752     1084     +332     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@musalbas
Copy link
Member

Looks generally fine to me directionally. Networking protocol should probably be in a CIP as it's relevant to other clients like lumina etc.

Comment on lines 94 to 98

Handshakes, Node Type Identification
- Node type must be declared during connection handshake by each of connection nodes
- Active connection state maintenance on full nodes. List should be maintained by node type
- Connection state should allow subscription for updated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the rationale behind sharing your node type? Is it possible to avoid sharing this information with peers?

There is a privacy concern we discussed a while ago with @musalbas and @adlerjohn. FNs ideally should be indistinguishable from LNs on p2p network, so they validators don't detect them.

If we don't think that's important anymore or sharing type is unavoidable, I also propose to cut the handshake roundtrip and rely on libp2p toolset for that.

Copy link
Member

@musalbas musalbas Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is indeed a goal, but the problem here is that it's implicit in this protocol that only full nodes request bitmaps from others anyway, regardless of how you identify in the handshake.

FNs pretending to be LNs (to reconstruct) would only work if the FNs spam sample requests to the block producer, which isn't something implemented yet afaik. (See https://forum.celestia.org/t/security-levels-for-data-availability-for-light-nodes/919#level-3-data-availability-sampling-without-an-honest-minority-of-light-nodes-6)

Copy link
Member

@Wondertan Wondertan Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is indeed a goal

if the FNs spam sample requests to the block producer, which isn't something implemented yet afaik.

Do you think this should be in scope of current iteration of reconstruction protocol?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not in this iteration, because it seems we're doing pretty good on increasing number of light nodes, so we could just focus directly on having a robust construction protocol and achieving Level 4 light nodes. It might be nice to have later as an option, though, but not high priority right now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, handshake is not necessary for initial basic implementation. It will be needed in peer scoring optimisation. When FN would need to know the type of peer for prioritisation of requests. I'll mark handshake protocol as future optimisation.

Copy link
Member

@renaynay renaynay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this proposal also require some modifications in discovery + peer management as it currently stands? If so, can you please outline what that would look like?

Comment on lines 123 to 124
rows []peers // Peers with row data
cols []peers // Peers with column data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this actually going to be structured?

you keep a []peer.ID per which row they claim to have? So index 0 []peer.ID will be peers that claim to have row 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The idea is to have efficient look up for peers who have full row or full col.


Local state:
- InProgress() -> bitmap. Tracks ongoing fetch sessions to prevent requesting duplicates
- Have() -> bitmap. Tracks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Have() -> bitmap. Tracks
- Have() -> bitmap. Tracks node's locally available samples

- Pre-confirmation threshold
- Combination of conditions

4.1.1 what and were (to request) biggest question.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what and where ?

4.2.1. Request

- Update InProgress bitmap before request initiation
- Use shrex for data retrieval. Send bitmap for data request. Shrex has built-in support for requesting rows and samplies, however bitmap-based data retrieval is more general and would be easier to support in future, because it allows requesting multiples pieces of data with single request instead of multiple requests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Use shrex for data retrieval. Send bitmap for data request. Shrex has built-in support for requesting rows and samplies, however bitmap-based data retrieval is more general and would be easier to support in future, because it allows requesting multiples pieces of data with single request instead of multiple requests
- Use shrex for data retrieval. Send bitmap for data request. Shrex has built-in support for requesting rows and samples, however bitmap-based data retrieval is more general and would be easier to support in future, because it allows requesting multiples pieces of data with single request instead of multiple requests

Optiomisations:
- Server may split response into multiple parts
Each part contains packed samples in Range format.
- Each part of shoold have specified message prefix. It would allow client to identify sample container type, which would help to maintain backwards compatibility in case of breaking changes in packing algorithm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Each part of shoold have specified message prefix. It would allow client to identify sample container type, which would help to maintain backwards compatibility in case of breaking changes in packing algorithm
- Each part of should have specified message prefix. It would allow client to identify sample container type, which would help to maintain backwards compatibility in case of breaking changes in packing algorithm

4.2.2. Response
- (base version) Server should respond with samples with proofs

Optiomisations:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Optiomisations:
Optimisations:

### Server-Side Implementation

#### 1. Storage Interface
New storage format needs to be implemented for efficient storage of sampels proofs. The format will be used
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
New storage format needs to be implemented for efficient storage of sampels proofs. The format will be used
New storage format needs to be implemented for efficient storage of sample proofs. The format will be used


#### 1. Storage Interface
New storage format needs to be implemented for efficient storage of sampels proofs. The format will be used
initially used for storing ongoing reconstruction process and later can be used for light node storage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
initially used for storing ongoing reconstruction process and later can be used for light node storage.
initially for storing ongoing reconstruction process and later can be used for light node storage.

@renaynay
Copy link
Member

Also generally speaking, would be nice to have a diagram at the end sketching out a reconstruction process e2e.

@walldiss
Copy link
Member Author

Would this proposal also require some modifications in discovery + peer management as it currently stands? If so, can you please outline what that would look like?

First implementation is simplified and does need to have any additional peer management. We would still need to have at least some(5) full nodes connected, which is ensured by current implementation of discovery. Requests will be send to random peers, so the only thing that needs to be tracked is all connected peers, which is available from libp2p. It would be easier to see once I add diagrams for FN reconstruction steps.

Copy link
Member Author

@walldiss walldiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback notes from sync call

```

#### Response
Server should respond with samples with proofs defined in shwap CIP [past link].
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Server should validate returned samples matches requested bitmap.

- Block size scaling
- Node count scaling

## Protocol Flow
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add general components overview from current node perspective:

  • daser
  • availability
  • shrex getter/ shrex seub

```
message BitmapUpdate {
Bitmap bitmap = 1;
bool is_end = 2;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename is_end -> completed

Comment on lines 126 to 131
Remote state:
```go
func (s *RemoteState) GetPeersWithSample(coords []Coord) []peer.ID
func (s *RemoteState) GetPeersWithAxis(axisIdx int, axisType AxisType) []peer.ID
func (s *RemoteState) Available() bitmap
```
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Add comments for method usacases.
  • Add that some methods are not necessary for init implementation.

func (s *ProgressState) Have() bitmap
```

### 3. Bitmap Protocol
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Necessary for FN-FN, can be more optimised version for FN-LN. We will start with only subsciption to have working implementation faster.

3. Fallback mechanisms



Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add testing plan / requirements

- return handshake based on user agent.
- Do not subscribe for samples from LN
- Request from LN in batches
- return handshake based on user agent.
- Do not subscribe for samples from LN
- Request from LN in batches
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants