Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The memory store allows multiple segments to run per node for the same repair #1519

Closed
1 task
adejanovski opened this issue Aug 23, 2024 · 3 comments · Fixed by #1533
Closed
1 task

The memory store allows multiple segments to run per node for the same repair #1519

adejanovski opened this issue Aug 23, 2024 · 3 comments · Fixed by #1533
Assignees

Comments

@adejanovski
Copy link
Contributor

adejanovski commented Aug 23, 2024

Project board link

It looks like the memory store doesn't honor the guarantee of scheduling a single segment per node per repair.
Instead, it's the maxParallelRepair concurrency number that seem to be applied as two repairs seem run at once.

At any time, one replica should run at most one segment per allowed repair, and never multiple segments for the same repair.
In the Cassandra storage implementation, we use LWTs to guarantee that.

Definition of Done

  • The memory storage implementation should allow no more than one segment per node to be scheduled for a given repair run

┆Issue is synchronized with this Jira Story by Unito
┆Issue Number: REAP-2

@vikingUnet
Copy link

Hello, Alexander! Sometimes it will be great to repair more then only one segment per node per one repair - it will be more faster if node have enough resources. This will be useful when we need run full segmented repair as fust as we can in an emergency situation (as example, when consistency in multi-dc have just been broken and we know about this). Can we add variable parameber for set maximum allowed parallel segments repairing count at same time for current repar? Of course we must understand the risk of affect on node if this variable will be very big, but if this value will be less or equeal 4 -this will not produce any problem for powerful nodes.I mean, that why is this limitation a dogma and can't be slightly increased with the possibility of flexible adjustment if necessary?

@adejanovski
Copy link
Contributor Author

Hello, Alexander! Sometimes it will be great to repair more then only one segment per node per one repair - it will be more faster if node have enough resources. This will be useful when we need run full segmented repair as fust as we can in an emergency situation (as example, when consistency in multi-dc have just been broken and we know about this). Can we add variable parameber for set maximum allowed parallel segments repairing count at same time for current repar? Of course we must understand the risk of affect on node if this variable will be very big, but if this value will be less or equeal 4 -this will not produce any problem for powerful nodes.I mean, that why is this limitation a dogma and can't be slightly increased with the possibility of flexible adjustment if necessary?

We the current state of the implementation, what you can do is allow n concurrent repair runs, and if you create a run per table that'll give you some concurrency. To reduce the overhead of segments and increase the pressure on your nodes, you can lower the number of segments, which will make them bigger and lead to shorter execution times.
You also have the repair threads that you can set up to 4, which will process concurrently the token ranges that will be grouped together in a segment.

@vikingUnet
Copy link

We the current state of the implementation, what you can do is allow n concurrent repair runs, and if you create a run per table that'll give you some concurrency. To reduce the overhead of segments and increase the pressure on your nodes, you can lower the number of segments, which will make them bigger and lead to shorter execution times.
You also have the repair threads that you can set up to 4, which will process concurrently the token ranges that will be grouped together in a segment.

Yes, it is interesting method for reducing segment numbers. But sometimes big segments - are big problem for repair on big node or in multi-dc cluster. Repairing one big node with 1TB data with 1 segment will be more riskable and tme long even in 4 threads than 4 parallel running segments. Very sad, that this feature will not be enabled in future - it will be great to use on some production clusters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants