Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stablize the consensus on node restart #425

Closed
Tracked by #326
ppca opened this issue Jan 19, 2024 · 5 comments
Closed
Tracked by #326

Stablize the consensus on node restart #425

ppca opened this issue Jan 19, 2024 · 5 comments
Labels
Emerging Tech Emerging Tech flying formation at Pagoda Near BOS NEAR BOS team at Pagoda

Comments

@ppca
Copy link
Contributor

ppca commented Jan 19, 2024

Occasionally we may have some nodes restarting. Our current system would simply restart without knowledge of any previously generated triples, presignatures. When a certain presignature_id is requested, and it's not available locally, the protocol is stuck checking for those presignature_id forever.

To fix this, there are 3 major points:

  1. Save last indexed block in a persistent storage so that the indexer does not have to start from the very beginning every time
  2. Make triples and presignatures persistent
  3. Communicate the lack of a specific triple/presignature to other nodes so that they can abort protocols proactively and choose something that all of them have

For the persistant storage option, 1 we could potentially store anywhere on gcp as it's not sensitive. 2 is very sensitive but also comes in big volume, so gcp secret manager is not ideal, we need another solution.

@ppca ppca added Near BOS NEAR BOS team at Pagoda Emerging Tech Emerging Tech flying formation at Pagoda labels Jan 19, 2024
@volovyks
Copy link
Collaborator

The second one is mentioned in the Multichain epic: #326 (Pick a persistent storage solution (let's not use Google Datastore this time though)). Let's prioritize converting that list into issues and keeping everything in the epic. We may go ahead and reconsider this approach later.

The third one can be fixed by #424 . I'm unsure if we need to introduce new types of messages about the absence of the triple. @ChaoticTempest what do you think?

@ppca
Copy link
Contributor Author

ppca commented Jan 22, 2024

The second one is mentioned in the Multichain epic: #326 (Pick a persistent storage solution (let's not use Google Datastore this time though)). Let's prioritize converting that list into issues and keeping everything in the epic. We may go ahead and reconsider this approach later.

cool.

The third one can be fixed by #424 . I'm unsure if we need to introduce new types of messages about the absence of the triple. @ChaoticTempest what do you think?

the PR I think refreshes on error, if we use similar approach, it might be we refresh on a delay over a certain threshold? Like a node goes offline and not yet backup, and the other nodes can refresh if they haven't received update on a protocol for some time. But will the other nodes know to choose a different set of triples tho?

@ChaoticTempest
Copy link
Member

The issue with waiting for a delay is that the protocol is an opaque type, which doesn't reveal any info about its internal state. Which means we can't tell whether or not it has advanced its state, but what we can do is see whether or not a message is being sent or received for a particular generation protocol based on the triple id.

On the other hand, we could simply have a timeout for a protocol. If it hasn't finished within the timeout, we will just start a new one. In the case a node is offline, we have a queue of messages that will wait for the node to be online again. The triple protocol does not need a particular node to be online since otherwise it wouldn't be a threshold system. So the offline node can hop online at any point and start generating triples if it needs to

@volovyks
Copy link
Collaborator

Timeout seems like a good and straightforward way of doing this.

@volovyks volovyks moved this from Backlog to Selected in Emerging Technologies Jan 23, 2024
@volovyks
Copy link
Collaborator

volovyks commented Feb 1, 2024

Blocked by #430

@volovyks volovyks moved this from Selected to Blocked in Emerging Technologies Feb 1, 2024
@volovyks volovyks moved this from Blocked to Selected in Emerging Technologies Feb 13, 2024
@github-project-automation github-project-automation bot moved this from Selected to Done in Emerging Technologies Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Emerging Tech Emerging Tech flying formation at Pagoda Near BOS NEAR BOS team at Pagoda
Projects
Status: Done
Development

No branches or pull requests

3 participants