-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds Support of maxNumRowsPerTask in RealtimeToOfflineSegmentsTasksGe… #14578
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #14578 +/- ##
============================================
+ Coverage 61.75% 64.02% +2.27%
- Complexity 207 1576 +1369
============================================
Files 2436 2687 +251
Lines 133233 147892 +14659
Branches 20636 22672 +2036
============================================
+ Hits 82274 94692 +12418
- Misses 44911 46248 +1337
- Partials 6048 6952 +904
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Will update tests once approach seems good to go. |
realtimeToOfflineSegmentsTaskMetadata.setNumSubtasksPending(numSubtasksLeft); | ||
|
||
try { | ||
if (numSubtasksLeft == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when a minion task fails before it decrementing this counter ? This state would never go down to zero right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, minion job will keep on retrying and will fail. Same subtasks will be picked in next iteration then.
This is the case in current scenario as well, If minion fails after segment is moved to offline but before watermark is updated, same segment gets picked again next time. But since segment name will be the same, already existing offline segments gets overwritten (Not a good approach though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current approach (Before this PR) is also not good, we might have to maintain a attribute in segment metadata marking that segment has been moved to offline.
RealtimeToOfflineSegmentsTaskMetadata realtimeToOfflineSegmentsTaskMetadata = | ||
getRTOTaskMetadata(realtimeTableName, completedSegmentsZKMetadata, bucketMs, realtimeToOfflineZNRecord); | ||
|
||
Preconditions.checkState(realtimeToOfflineSegmentsTaskMetadata.getNumSubtasksPending() == 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minions can fail and trip this invariant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes correct. I have removed this precondition.
_minionTaskZkMetadataManager.setTaskMetadataZNRecord(newMinionMetadata, RealtimeToOfflineSegmentsTask.TASK_TYPE, | ||
_expectedVersion); | ||
|
||
while (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since realtimeToOffline is a period task to move segments from realtime to offline table. Its on a schedule (it should not be allowed to run adhoc). Why don't we have the task generator advance the watermark instead of doing it from the minion. The task generator anyway needs to handle the minion execution failure scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If task generator advances watermark, It would need to have some state to refer which segments has been moved to offline. That is what is proposed in solution 2 in description.
Fixes: #12857
(Issue description: Currently RealtimeToOfflineSegmentsTask that is used to move real time segments to offline segments, does not have the ability to tune maxNumRowsPerTask. This is the parameter that determines the input to a task. Without this configuration, we end up creating one minion task, which takes in all the input (i.e all segments that meet the criteria to be converted to offline segments) which prevents us from using other minions.
There's no parallelism for this task.)
Propose Solution (Rejected, check Alternate solution):
Update: Alternate solution