The memory store allows multiple segments to run per node for the same repair #1519

adejanovski · 2024-08-23T08:28:05Z

It looks like the memory store doesn't honor the guarantee of scheduling a single segment per node per repair.
Instead, it's the maxParallelRepair concurrency number that seem to be applied as two repairs seem run at once.

At any time, one replica should run at most one segment per allowed repair, and never multiple segments for the same repair.
In the Cassandra storage implementation, we use LWTs to guarantee that.

Definition of Done

The memory storage implementation should allow no more than one segment per node to be scheduled for a given repair run

┆Issue is synchronized with this Jira Story by Unito
┆Issue Number: REAP-2

vikingUnet · 2024-08-23T19:49:40Z

Hello, Alexander! Sometimes it will be great to repair more then only one segment per node per one repair - it will be more faster if node have enough resources. This will be useful when we need run full segmented repair as fust as we can in an emergency situation (as example, when consistency in multi-dc have just been broken and we know about this). Can we add variable parameber for set maximum allowed parallel segments repairing count at same time for current repar? Of course we must understand the risk of affect on node if this variable will be very big, but if this value will be less or equeal 4 -this will not produce any problem for powerful nodes.I mean, that why is this limitation a dogma and can't be slightly increased with the possibility of flexible adjustment if necessary?

adejanovski · 2024-12-12T06:51:56Z

Hello, Alexander! Sometimes it will be great to repair more then only one segment per node per one repair - it will be more faster if node have enough resources. This will be useful when we need run full segmented repair as fust as we can in an emergency situation (as example, when consistency in multi-dc have just been broken and we know about this). Can we add variable parameber for set maximum allowed parallel segments repairing count at same time for current repar? Of course we must understand the risk of affect on node if this variable will be very big, but if this value will be less or equeal 4 -this will not produce any problem for powerful nodes.I mean, that why is this limitation a dogma and can't be slightly increased with the possibility of flexible adjustment if necessary?

We the current state of the implementation, what you can do is allow n concurrent repair runs, and if you create a run per table that'll give you some concurrency. To reduce the overhead of segments and increase the pressure on your nodes, you can lower the number of segments, which will make them bigger and lead to shorter execution times.
You also have the repair threads that you can set up to 4, which will process concurrently the token ranges that will be grouped together in a segment.

vikingUnet · 2024-12-12T08:36:35Z

We the current state of the implementation, what you can do is allow n concurrent repair runs, and if you create a run per table that'll give you some concurrency. To reduce the overhead of segments and increase the pressure on your nodes, you can lower the number of segments, which will make them bigger and lead to shorter execution times.
You also have the repair threads that you can set up to 4, which will process concurrently the token ranges that will be grouped together in a segment.

Yes, it is interesting method for reducing segment numbers. But sometimes big segments - are big problem for repair on big node or in multi-dc cluster. Repairing one big node with 1TB data with 1 segment will be more riskable and tme long even in 4 threads than 4 parallel running segments. Very sad, that this feature will not be enabled in future - it will be great to use on some production clusters.

adejanovski added this to K8ssandra Aug 23, 2024

adejanovski mentioned this issue Dec 12, 2024

Memory store leader election - REAP-2 #1533

Merged

sync-by-unito bot assigned adejanovski Dec 12, 2024

adejanovski closed this as completed in #1533 Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The memory store allows multiple segments to run per node for the same repair #1519

The memory store allows multiple segments to run per node for the same repair #1519

adejanovski commented Aug 23, 2024 •

edited by sync-by-unito bot

Loading

vikingUnet commented Aug 23, 2024

adejanovski commented Dec 12, 2024

vikingUnet commented Dec 12, 2024

The memory store allows multiple segments to run per node for the same repair #1519

The memory store allows multiple segments to run per node for the same repair #1519

Comments

adejanovski commented Aug 23, 2024 • edited by sync-by-unito bot Loading

Definition of Done

vikingUnet commented Aug 23, 2024

adejanovski commented Dec 12, 2024

vikingUnet commented Dec 12, 2024

adejanovski commented Aug 23, 2024 •

edited by sync-by-unito bot

Loading