Support for semisynchronous replication #84

misterbisson · 2017-06-05T07:15:34Z

https://dev.mysql.com/doc/refman/5.7/en/replication-semisync.html

In addition to the built-in asynchronous replication, MySQL 5.7 supports an interface to semisynchronous replication[...]. With asynchronous replication, if the master crashes, transactions that it has committed might not have been transmitted to any slave. Compared to asynchronous replication, semisynchronous replication provides improved data integrity because when a commit returns successfully, it is known that the data exists in at least two places.

Semisynchronous replication falls between asynchronous and fully synchronous replication. The master waits only until at least one slave has received and logged the events. It does not wait for all slaves to acknowledge receipt, and it requires only receipt, not that the events have been fully executed and committed on the slave side.

If semisynchronous replication is enabled on the master side and there is at least one semisynchronous slave, a thread that performs a transaction commit on the master blocks and waits until at least one semisynchronous slave acknowledges that it has received all events for the transaction, or until a timeout occurs.

Spitballing:

If semisynchronous replication is implemented, it may improve the consistency that may be expected of Autopilot Pattern MySQL in case of a failure of the primary.

It does not appear that all replicas need to be semisynchronous. Indeed, it may not be desirable for them to be. That suggests it might be ideal to designate one replica as semisynchronous for this purpose. That one semisynchronous replica might report itself to Consul as a different service name, just as the mysql-primary is a different service name from mysql. The semisynchronous replica would have dibs on promotion to primary in case of any primary failure.

The text was updated successfully, but these errors were encountered:

tgross · 2017-06-05T12:35:49Z

I'll keep this in mind. Right now we're missing any kind of modeling that outlines what the consistency guarantees actually are and then tells us our implementation of those guarantees is correct. I've looked into semi-synchronous replication and it's not clear to me yet how it effects consistency guarantees; at my first-pass it looks like it only gives you a bit more reliability on the (extremely minimal) consistency guarantees that async replication gives you, at the cost of some availability.

misterbisson · 2017-06-07T00:28:58Z

There's no expectation of immediate action, I just wanted to log the detail.

The consistency guarantee appears much better than async replication:

The master waits only until at least one slave has received and logged the events. It does not wait for all slaves to acknowledge receipt, and it requires only receipt, not that the events have been fully executed and committed on the slave side.

This appears to promise that it requires a failure in both the primary and the semisync replica for data loss to occur. That's a lot better than async replication.

However, the failure mode is for write availability over multi-host consistency:

If a timeout occurs without any slave having acknowledged the transaction, the master reverts to asynchronous replication. When at least one semisynchronous slave catches up, the master returns to semisynchronous replication.

Both quotes are from https://dev.mysql.com/doc/refman/5.7/en/replication-semisync.html

tgross · 2017-06-07T12:35:22Z

This appears to promise that it requires a failure in both the primary and the semisync replica for data loss to occur. That's a lot better than async replication.

It is better in terms of data loss (i.e. client-acknowledged writes are less likely to have been lost) but I'm not as certain that it's better from the standpoint of consistency in the face of implicit non-fatal failures. My primary concern with this mode of operation comes from this section of the docs:

It does not wait for all slaves to acknowledge receipt, and it requires only receipt, not that the events have been fully executed and committed on the slave side.

Replicas only acknowledge receipt, not a completed write. Which hypothetically isn't any worse than async replication (where we don't even ack receipt), but it isn't explicitly described what happens in this scenario and how a replica catches up to the primary if it's dropped a write for an acknowledged receipt. It could very well be -- and I'd expect it to be -- that a replica that has a temporary netsplit just catches back up using the last GTID. If not it's possible to not just get lost data but inconsistent data, which is much worse. But this doesn't appear in the docs so I want to make sure we genuinely understand the behavior.

If a timeout occurs without any slave having acknowledged the transaction, the master reverts to asynchronous replication. When at least one semisynchronous slave catches up, the master returns to semisynchronous replication.

This implicit degradation of behavior seems potentially dangerous. My overall feel on this feature is that semi-synchronous replication is going to encourage application developers to try to read their writes from the replicas, which is incorrect. Semi-synchronous seems like a bad compromise between async and sync.

tgross added the proposal label Jun 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for semisynchronous replication #84

Support for semisynchronous replication #84

misterbisson commented Jun 5, 2017

tgross commented Jun 5, 2017

misterbisson commented Jun 7, 2017

tgross commented Jun 7, 2017 •

edited

Loading

Support for semisynchronous replication #84

Support for semisynchronous replication #84

Comments

misterbisson commented Jun 5, 2017

tgross commented Jun 5, 2017

misterbisson commented Jun 7, 2017

tgross commented Jun 7, 2017 • edited Loading

tgross commented Jun 7, 2017 •

edited

Loading