Skip to content
This repository has been archived by the owner on Jul 15, 2019. It is now read-only.

reconfiguration mess #64

Open
xiang90 opened this issue Jun 9, 2016 · 2 comments
Open

reconfiguration mess #64

xiang90 opened this issue Jun 9, 2016 · 2 comments

Comments

@xiang90
Copy link

xiang90 commented Jun 9, 2016

In the reconfiguration mess doc, it mentioned a few things that are not exactly true.

etcd/raft requires that a new replica being added must know the exact state of the cluster at the moment it is added. Similarly, replicas who are not yet aware of a recent reconfigurations are not able to receive commands from the new nodes: this means that a new node serving as a leader cannot help those replicas to catch up. Nodes not in the cluster cannot catch up with the cluster before being added -- and adding them would reduce availability.

etcd/raft does not have this requirement. You can just add node and starts that node with no configuration at all. Also replicas can receives commands from leader even if it does not know the recent configuration.

The truth is that in etcd we add additional stricter checking which are necessary in our use case. You do not have to do any checking if you do not willing to.

@xiang90
Copy link
Author

xiang90 commented Jun 9, 2016

Basically this doc describes what etcd does not what etcd/raft does. As far as I know, coname depends on etcd/raft not etcd. So the decisions we made in etcd should not affect coname at all.

@andres-erbsen
Copy link
Contributor

Okay. I agree that the "reconfiguration mess" document is inaccurate. My apologies for confusing limitations of my understanding of etcd/raft with limitations of the implementation itself. And the availability failure referenced in the doc was indeed fixed a while ago.

As for how to resolve this, I think the best solution would be to have etcd/raft documentation include a precise specification about what one needs to ensure to make sure cluster membership changes are safe. I would particularly like to see explicit promises (or disclaimers) about the following scenarios:

  • the configuration a node is started with is out of date
  • the configuration a node is started with is arbitrarily inaccurate
  • most nodes commit and apply multiple cluster membership changes in a row, but one node is significantly behind with applying entries (see also raft: rework comment for advance interface etcd-io/etcd#4049 (comment)).
  • that, plus a node being removed from the cluster does not know it was removed and continues to communicate with the lagging node.
  • that, plus there is a snapshot between some of the entries that have not propagated to all nodes yet.
  • all that, plus the cluster would later need participation of the lagging node to make progress
  • there are no snapshots, ever.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants