reconfiguration mess #64

xiang90 · 2016-06-09T17:17:12Z

In the reconfiguration mess doc, it mentioned a few things that are not exactly true.

etcd/raft requires that a new replica being added must know the exact state of the cluster at the moment it is added. Similarly, replicas who are not yet aware of a recent reconfigurations are not able to receive commands from the new nodes: this means that a new node serving as a leader cannot help those replicas to catch up. Nodes not in the cluster cannot catch up with the cluster before being added -- and adding them would reduce availability.

etcd/raft does not have this requirement. You can just add node and starts that node with no configuration at all. Also replicas can receives commands from leader even if it does not know the recent configuration.

The truth is that in etcd we add additional stricter checking which are necessary in our use case. You do not have to do any checking if you do not willing to.

xiang90 · 2016-06-09T17:18:11Z

Basically this doc describes what etcd does not what etcd/raft does. As far as I know, coname depends on etcd/raft not etcd. So the decisions we made in etcd should not affect coname at all.

andres-erbsen · 2016-06-19T02:15:43Z

Okay. I agree that the "reconfiguration mess" document is inaccurate. My apologies for confusing limitations of my understanding of etcd/raft with limitations of the implementation itself. And the availability failure referenced in the doc was indeed fixed a while ago.

As for how to resolve this, I think the best solution would be to have etcd/raft documentation include a precise specification about what one needs to ensure to make sure cluster membership changes are safe. I would particularly like to see explicit promises (or disclaimers) about the following scenarios:

the configuration a node is started with is out of date
the configuration a node is started with is arbitrarily inaccurate
most nodes commit and apply multiple cluster membership changes in a row, but one node is significantly behind with applying entries (see also raft: rework comment for advance interface etcd-io/etcd#4049 (comment)).
that, plus a node being removed from the cluster does not know it was removed and continues to communicate with the lagging node.
that, plus there is a snapshot between some of the entries that have not propagated to all nodes yet.
all that, plus the cluster would later need participation of the lagging node to make progress
there are no snapshots, ever.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reconfiguration mess #64

reconfiguration mess #64

xiang90 commented Jun 9, 2016

xiang90 commented Jun 9, 2016

andres-erbsen commented Jun 19, 2016

reconfiguration mess #64

reconfiguration mess #64

Comments

xiang90 commented Jun 9, 2016

xiang90 commented Jun 9, 2016

andres-erbsen commented Jun 19, 2016