Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Support of maxNumRowsPerTask in RealtimeToOfflineSegmentsTasksGe… #14578

Closed
wants to merge 13 commits into from

Conversation

noob-se7en
Copy link
Contributor

@noob-se7en noob-se7en commented Dec 2, 2024

Fixes: #12857

(Issue description: Currently RealtimeToOfflineSegmentsTask that is used to move real time segments to offline segments, does not have the ability to tune maxNumRowsPerTask. This is the parameter that determines the input to a task. Without this configuration, we end up creating one minion task, which takes in all the input (i.e all segments that meet the criteria to be converted to offline segments) which prevents us from using other minions.
There's no parallelism for this task.)

Propose Solution (Rejected, check Alternate solution):

  1. Divide a task into multiple subtasks based on max num of rows per subtask. Once subtasks are generated, during the execution of subtasks, each subtasks atomically updates zookeeper state after it's execution is finished. The last subtask pending will updates the watermark after checking zookeeper if any subtasks is pending.

Update: Alternate solution

@noob-se7en noob-se7en changed the title Adds Support of maxNumRowsPerTask in RealtimeToOfflineSegmentsTasksGe… [WIP] Adds Support of maxNumRowsPerTask in RealtimeToOfflineSegmentsTasksGe… Dec 2, 2024
@codecov-commenter
Copy link

codecov-commenter commented Dec 2, 2024

Codecov Report

Attention: Patch coverage is 64.42308% with 37 lines in your changes missing coverage. Please review.

Project coverage is 64.02%. Comparing base (59551e4) to head (58eb51c).
Report is 1444 commits behind head on master.

Files with missing lines Patch % Lines
...egments/RealtimeToOfflineSegmentsTaskExecutor.java 0.00% 16 Missing ⚠️
...gments/RealtimeToOfflineSegmentsTaskGenerator.java 78.66% 11 Missing and 5 partials ⚠️
.../minion/RealtimeToOfflineSegmentsTaskMetadata.java 61.53% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14578      +/-   ##
============================================
+ Coverage     61.75%   64.02%   +2.27%     
- Complexity      207     1576    +1369     
============================================
  Files          2436     2687     +251     
  Lines        133233   147892   +14659     
  Branches      20636    22672    +2036     
============================================
+ Hits          82274    94692   +12418     
- Misses        44911    46248    +1337     
- Partials       6048     6952     +904     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 ?
java-11 63.99% <64.42%> (+2.28%) ⬆️
java-21 63.92% <64.42%> (+2.29%) ⬆️
skip-bytebuffers-false 64.02% <64.42%> (+2.27%) ⬆️
skip-bytebuffers-true 63.87% <64.42%> (+36.15%) ⬆️
temurin 64.02% <64.42%> (+2.27%) ⬆️
unittests 64.02% <64.42%> (+2.27%) ⬆️
unittests1 56.17% <61.53%> (+9.27%) ⬆️
unittests2 34.57% <56.73%> (+6.84%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yashmayya yashmayya added Configuration Config changes (addition/deletion/change in behavior) ingestion minion labels Dec 3, 2024
@noob-se7en noob-se7en changed the title [WIP] Adds Support of maxNumRowsPerTask in RealtimeToOfflineSegmentsTasksGe… Adds Support of maxNumRowsPerTask in RealtimeToOfflineSegmentsTasksGe… Dec 9, 2024
@noob-se7en
Copy link
Contributor Author

Will update tests once approach seems good to go.

@noob-se7en noob-se7en marked this pull request as ready for review December 9, 2024 07:12
realtimeToOfflineSegmentsTaskMetadata.setNumSubtasksPending(numSubtasksLeft);

try {
if (numSubtasksLeft == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when a minion task fails before it decrementing this counter ? This state would never go down to zero right?

Copy link
Contributor Author

@noob-se7en noob-se7en Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, minion job will keep on retrying and will fail. Same subtasks will be picked in next iteration then.
This is the case in current scenario as well, If minion fails after segment is moved to offline but before watermark is updated, same segment gets picked again next time. But since segment name will be the same, already existing offline segments gets overwritten (Not a good approach though).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current approach (Before this PR) is also not good, we might have to maintain a attribute in segment metadata marking that segment has been moved to offline.

RealtimeToOfflineSegmentsTaskMetadata realtimeToOfflineSegmentsTaskMetadata =
getRTOTaskMetadata(realtimeTableName, completedSegmentsZKMetadata, bucketMs, realtimeToOfflineZNRecord);

Preconditions.checkState(realtimeToOfflineSegmentsTaskMetadata.getNumSubtasksPending() == 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minions can fail and trip this invariant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes correct. I have removed this precondition.

_minionTaskZkMetadataManager.setTaskMetadataZNRecord(newMinionMetadata, RealtimeToOfflineSegmentsTask.TASK_TYPE,
_expectedVersion);

while (true) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since realtimeToOffline is a period task to move segments from realtime to offline table. Its on a schedule (it should not be allowed to run adhoc). Why don't we have the task generator advance the watermark instead of doing it from the minion. The task generator anyway needs to handle the minion execution failure scenarios.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If task generator advances watermark, It would need to have some state to refer which segments has been moved to offline. That is what is proposed in solution 2 in description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Configuration Config changes (addition/deletion/change in behavior) ingestion minion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support maxNumRowsPerTask in RealtimeToOfflineSegmentsTask
4 participants