Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failover lost master #89

Open
dfredell opened this issue Sep 29, 2017 · 1 comment
Open

Failover lost master #89

dfredell opened this issue Sep 29, 2017 · 1 comment

Comments

@dfredell
Copy link
Contributor

dfredell commented Sep 29, 2017

I found a scenario where the cluster looses its master.

It occurred when:

  1. I had 3 nodes running healthily, remote consul, static root password
  2. I killed the master
  3. Failover started on 37
  4. mysqlrpladmin on 37 decided that 36 should be the master
  5. 36 detected that he is the new master
  6. 36 creates a new containerpilot.json with the service 'mysql-primary`
  7. Then 36 runs containerpilot -reload
  8. This causes mysql to stop and start
  9. When mysql comes back up mysql doesn't have a record of primary
  10. Also when reading from /v1/kv/mysql-primary there is no result

failover.log
servers

  1. docker compose name: mysql_4 hostname: mysql-37f99a0a7a84 IP:192.168.128.236
  2. docker compose name: mysql_5 hostname: mysql-363deb257281 IP:192.168.128.235

The fail over works great if the node that gets the fail-over lock also wins the mysqlrpladmin poll.

@dfredell
Copy link
Contributor Author

Node 36 is assigned service/mysql-primary then does a reboot because of containerpilot -reload. Then he doesn't remember who master is supposed to be or where his friends are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant