Stablize the consensus on node restart #425

ppca · 2024-01-19T00:16:12Z

Occasionally we may have some nodes restarting. Our current system would simply restart without knowledge of any previously generated triples, presignatures. When a certain presignature_id is requested, and it's not available locally, the protocol is stuck checking for those presignature_id forever.

To fix this, there are 3 major points:

Save last indexed block in a persistent storage so that the indexer does not have to start from the very beginning every time
Make triples and presignatures persistent
Communicate the lack of a specific triple/presignature to other nodes so that they can abort protocols proactively and choose something that all of them have

For the persistant storage option, 1 we could potentially store anywhere on gcp as it's not sensitive. 2 is very sensitive but also comes in big volume, so gcp secret manager is not ideal, we need another solution.

volovyks · 2024-01-19T12:38:10Z

The second one is mentioned in the Multichain epic: #326 (Pick a persistent storage solution (let's not use Google Datastore this time though)). Let's prioritize converting that list into issues and keeping everything in the epic. We may go ahead and reconsider this approach later.

The third one can be fixed by #424 . I'm unsure if we need to introduce new types of messages about the absence of the triple. @ChaoticTempest what do you think?

ppca · 2024-01-22T20:12:34Z

The second one is mentioned in the Multichain epic: #326 (Pick a persistent storage solution (let's not use Google Datastore this time though)). Let's prioritize converting that list into issues and keeping everything in the epic. We may go ahead and reconsider this approach later.

cool.

The third one can be fixed by #424 . I'm unsure if we need to introduce new types of messages about the absence of the triple. @ChaoticTempest what do you think?

the PR I think refreshes on error, if we use similar approach, it might be we refresh on a delay over a certain threshold? Like a node goes offline and not yet backup, and the other nodes can refresh if they haven't received update on a protocol for some time. But will the other nodes know to choose a different set of triples tho?

ChaoticTempest · 2024-01-22T20:30:49Z

The issue with waiting for a delay is that the protocol is an opaque type, which doesn't reveal any info about its internal state. Which means we can't tell whether or not it has advanced its state, but what we can do is see whether or not a message is being sent or received for a particular generation protocol based on the triple id.

On the other hand, we could simply have a timeout for a protocol. If it hasn't finished within the timeout, we will just start a new one. In the case a node is offline, we have a queue of messages that will wait for the node to be online again. The triple protocol does not need a particular node to be online since otherwise it wouldn't be a threshold system. So the offline node can hop online at any point and start generating triples if it needs to

volovyks · 2024-01-23T10:05:13Z

Timeout seems like a good and straightforward way of doing this.

volovyks · 2024-02-01T17:53:05Z

Blocked by #430

ppca added Near BOS NEAR BOS team at Pagoda Emerging Tech Emerging Tech flying formation at Pagoda labels Jan 19, 2024

github-project-automation bot added this to Emerging Technologies Jan 19, 2024

github-project-automation bot moved this to Backlog in Emerging Technologies Jan 19, 2024

ppca mentioned this issue Jan 19, 2024

🔷 [Epic] Multichain #326

Closed

volovyks moved this from Backlog to Selected in Emerging Technologies Jan 23, 2024

volovyks moved this from Selected to Blocked in Emerging Technologies Feb 1, 2024

volovyks moved this from Blocked to Selected in Emerging Technologies Feb 13, 2024

ChaoticTempest closed this as completed Mar 12, 2024

github-project-automation bot moved this from Selected to Done in Emerging Technologies Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stablize the consensus on node restart #425

Stablize the consensus on node restart #425

ppca commented Jan 19, 2024

volovyks commented Jan 19, 2024

ppca commented Jan 22, 2024 •

edited

Loading

ChaoticTempest commented Jan 22, 2024

volovyks commented Jan 23, 2024

volovyks commented Feb 1, 2024

Stablize the consensus on node restart #425

Stablize the consensus on node restart #425

Comments

ppca commented Jan 19, 2024

volovyks commented Jan 19, 2024

ppca commented Jan 22, 2024 • edited Loading

ChaoticTempest commented Jan 22, 2024

volovyks commented Jan 23, 2024

volovyks commented Feb 1, 2024

ppca commented Jan 22, 2024 •

edited

Loading