-
Notifications
You must be signed in to change notification settings - Fork 724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement: CLUSTER REPLICATE NO ONE #1674
base: unstable
Are you sure you want to change the base?
Implement: CLUSTER REPLICATE NO ONE #1674
Conversation
91589d1
to
ff96c0f
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #1674 +/- ##
============================================
+ Coverage 71.04% 71.10% +0.06%
============================================
Files 121 123 +2
Lines 65254 65555 +301
============================================
+ Hits 46357 46612 +255
- Misses 18897 18943 +46
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cluster-replicate.json
file should be updated and as part of the build commands.def
will get updated. or if it was accidentally not staged, please add that.
Also, could you run the clang-format on your end to fix some of the formatting issue.
src/cluster_legacy.c
Outdated
if (server.primary != NULL) { | ||
replicationUnsetPrimary(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All other invocation of replicationUnsetPrimary()
don't have this wrapped under the condition. Is it unnecessary to invoke it if server.primary is NULL ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also unsure why clusterPromoteSelfToPrimary
was introduced, seems like it's the same behavior at this point but good to call this abstraction if there will be additional steps introduced in the future for cluster mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also unsure why clusterPromoteSelfToPrimary was introduced, seems like it's the same behavior at this point but good to call this abstraction if there will be additional steps introduced in the future for cluster mode.
Ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All other invocation of replicationUnsetPrimary() don't have this wrapped under the condition. Is it unnecessary to invoke it if server.primary is NULL ?
Probably not needed. It looks like left over from KeyDB's implementation where primary needed to be passed to replicationUnsetMaster
fde1ab6
to
4abbae8
Compare
Updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature makes sense to me.
@valkey-io/core-team New arguments = major decision. Please approve or vote if you agree.
4abbae8
to
85238e6
Compare
The CI job "DCO" is failing. You need to use Why we need it? See here: https://github.com/valkey-io/valkey/blob/unstable/CONTRIBUTING.md#developer-certificate-of-origin thanks! |
85238e6
to
3789227
Compare
3789227
to
e4e8b24
Compare
Done |
src/cluster_legacy.c
Outdated
/* Reset manual failover state. */ | ||
resetManualFailover(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it become a primary, do we need this reset? it has no way to enter the failover check logic. or at least we remove the comment since this line is super easy.
/* Reset manual failover state. */ | |
resetManualFailover(); | |
resetManualFailover(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean by "no way to enter the failover check logic". This is to abort any in-progress failover (which can happen, right?).
Comment is removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i mean, when the node become a primary, it has no way enter this mf handle logic.
if (nodeIsReplica(myself)) {
clusterHandleManualFailover();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. I'm not sure I got you right. If you mean that there is no way to have manual failover to be in progress by the time we come to
Line 7050 in bed392b
resetManualFailover(); |
CLUSTER REPLICATE NO ONE
. Also, CLUSTER RESET
(which also may changes replica to primary) does the same: it resets any in-progress failover: Line 1298 in bed392b
resetManualFailover(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know it could start before the NO ONE, i mean, if mf is in progress, will it be a problem if it is not reset? What will happen? I know CLUSTER RESET do the same thing, it is a old time stuff, i am ok with resetting it, i was just thinking about this, we definitely have other places where we haven't reset it when the role changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we definitely have other places where we haven't reset it when the role changed.
Interesting. Is this a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason of not resetting it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not against it, this is just a question, after it becomes a primary, will there be any problems with the uncleaned mf failover? As far as i know there is no problem (only a log issue like the replica (empty primary) will print the mf timeout something like that). The original primary has a known issue with pause timeout.
i am ok with resetting it. I would probably prefer to have one that refuses to execute NO ONE if the replica is in failover or is in mf failover. The admin should do it in a stable state, which means the replica should not be in the failover state (or at least should not in mf failover). But there is no harm i guess (i dont see a problem here), so i am also ok with it.
e4e8b24
to
bed392b
Compare
Any objection to merge it? |
We're busy making the 8.1.0 release candidate just now. This one will need to wait and get merged after that. |
We should also have some tests validating this new behavior works as intended. Have a cluster, disconnect the replica, make sure slots/shards and all are still consistent and the rest of the cluster agrees on the state. |
{ | ||
"name": "no-one", | ||
"type": "block", | ||
"arguments": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should also add a since field, and a history field to mention this args.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
src/cluster_legacy.c
Outdated
} else if (!strcasecmp(c->argv[1]->ptr, "replicate") && c->argc == 3) { | ||
/* CLUSTER REPLICATE <NODE ID> */ | ||
} else if (!strcasecmp(c->argv[1]->ptr, "replicate") && (c->argc == 3 || c->argc == 4)) { | ||
/* CLUSTER REPLICATE (<NODE ID> | NO ONE)*/ | ||
/* Lookup the specified node in our table. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please also move this comment to below, near the clusterLookupNode one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
Can we introduce a new mode so it doesn't forget all the nodes in the cluster? I think conceptually we are discussing a form of reset still so it seems to me that the solution is too tactical. Maybe BTW, I just noticed that the forget path is not always working. The reset node joined back to the cluster quickly.
I don't see the implementation moves the node to a new shard. This would leave two primaries (one real and one empty) in the original shard, which will confuse the client. |
Good catch! |
IMHO that is just a syntactical question. Whatever command name we would come up with, the behavior of it would be the same: turn replica into primary with leaving it in the cluster. If you think the name of
What do you mean by "forget path is not always working"? What forget path? We are not doing any forgetting here.
AFAIU if node is reset it will not be added back to the cluster automatically, but only when somebody does it explicitly. So if you want to remove replica manually (i.e. node is scheduled for maintenance) the only option for you is to reset with affecting all clients (start seeing errors).
What is a shard? Would you please shed a light on this term? Is it something special for ValKey? AFAIU shard is primary with replicas attached to it. So, when we switched role of the node from replica to empty primary, doesn't it mean that we moved it to separate new shard? |
bed392b
to
b419cfb
Compare
I believe I figured it out. There is internal |
b9ad4fd
to
95c0b42
Compare
I agree with @skolosov-snap that CLUSTER REPLICATE NO ONE is better. The node is not leaving the cluster so RESET seems less intuitive.
To make the cluster forget a node, you need to send |
Looking at the doc it should behave differently:
So, I believe potentially it may be fixed in the future. @PingXie, @zuiderkwast You are right about current
|
Signed-off-by: Sergey Kolosov <[email protected]>
95c0b42
to
445e078
Compare
@skolosov-snap @zuiderkwast you have a good point. BTW, a follow up thought, should we consider a single token |
I suggested it earlier in this PR in comment #1674 (comment) and I was already convinced that NO ONE is better. 😆 |
Hmm the comment link seems broken. What convinced you? :) |
Not a big deal to me, but IMHO consistency/similarity between commands is better. What benefit will we get if we replace it with single toke? |
The probability will be practically 0.
We have sds human_nodename; /* The known human readable nodename for this node */ |
Interesting, the link works for me. It's one of the resolved comments above on Quoting @skolosov-snap's comment which made sense to me:
|
Currently, ValKey doesn't allow to detach replica attached to primary node. So, if you want to change cluster topology the only way to do it is to reset (
CLUSTER RESET
command) the node. However, this results into removing node from the cluster what affects clients. All clients will keep sending traffic to this node (with getting inaccurate responses) until they refresh their topology.In this change we implement supporting of new argument for CLUSTER REPLICATE command:
CLUSTER REPLICATE NO ONE
. When calling this command the node will be converted from replica to empty primary node but still staying in the cluster. Thus, all traffic coming from the clients to this node can be redirected to correct node.