-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stablize the consensus on node restart #425
Comments
The second one is mentioned in the Multichain epic: #326 (Pick a persistent storage solution (let's not use Google Datastore this time though)). Let's prioritize converting that list into issues and keeping everything in the epic. We may go ahead and reconsider this approach later. The third one can be fixed by #424 . I'm unsure if we need to introduce new types of messages about the absence of the triple. @ChaoticTempest what do you think? |
cool.
the PR I think refreshes on error, if we use similar approach, it might be we refresh on a delay over a certain threshold? Like a node goes offline and not yet backup, and the other nodes can refresh if they haven't received update on a protocol for some time. But will the other nodes know to choose a different set of triples tho? |
The issue with waiting for a delay is that the protocol is an opaque type, which doesn't reveal any info about its internal state. Which means we can't tell whether or not it has advanced its state, but what we can do is see whether or not a message is being sent or received for a particular generation protocol based on the triple id. On the other hand, we could simply have a timeout for a protocol. If it hasn't finished within the timeout, we will just start a new one. In the case a node is offline, we have a queue of messages that will wait for the node to be online again. The triple protocol does not need a particular node to be online since otherwise it wouldn't be a threshold system. So the offline node can hop online at any point and start generating triples if it needs to |
Timeout seems like a good and straightforward way of doing this. |
Blocked by #430 |
Occasionally we may have some nodes restarting. Our current system would simply restart without knowledge of any previously generated triples, presignatures. When a certain presignature_id is requested, and it's not available locally, the protocol is stuck checking for those presignature_id forever.
To fix this, there are 3 major points:
For the persistant storage option, 1 we could potentially store anywhere on gcp as it's not sensitive. 2 is very sensitive but also comes in big volume, so gcp secret manager is not ideal, we need another solution.
The text was updated successfully, but these errors were encountered: