Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Fix cuda storage transfer deadlock on multiple GPUs #788

Merged
merged 48 commits into from
Nov 16, 2024

Conversation

luweizheng
Copy link
Collaborator

Fix storage transfer deadlock of CUDA storage on multiple GPUs. I do this work based on #488

The current implementation of the transfer function leads to a deadlock when executing Xorbits on multiple GPUs. The issue arises from the StorageHandlerActor.fetch_batch function, which invokes SenderManagerActor.send_batch_data and subsequently calls StorageHandlerActor.request_quota_with_spill. Due to the locking mechanism within the StorageHandlerActor method call, a deadlock arises.

NOTEs:

  • fix dead lock on GPU
  • fix GPU buffer size 0 issue
  • test on TPC-H SF10 and it works. Much faster than Dask-cuDF.
  • known issue: test_transfer_gpu.py of ucx channel has bugs while socket channel works.
  • known issue: too many actor calls in this implement. May lead to performance down.

Check code requirements

  • tests added / passed (if needed)
  • Ensure all linting tests pass

@XprobeBot XprobeBot added enhancement New feature or request gpu labels Jul 6, 2024
@XprobeBot XprobeBot added this to the v0.7.3 milestone Jul 6, 2024
Copy link

codecov bot commented Jul 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.46%. Comparing base (5bb0211) to head (b8ddb25).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #788      +/-   ##
==========================================
- Coverage   82.49%   82.46%   -0.03%     
==========================================
  Files        1071     1071              
  Lines       80119    80167      +48     
  Branches    12202    12207       +5     
==========================================
+ Hits        66094    66110      +16     
- Misses      12478    12496      +18     
- Partials     1547     1561      +14     
Flag Coverage Δ
unittests 82.36% <ø> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@XprobeBot XprobeBot modified the milestones: v0.7.3, v0.7.4 Aug 22, 2024
@hucorz
Copy link
Collaborator

hucorz commented Nov 15, 2024

The original transfer process is illustrated below.Each StorageHandler corresponds to a Receiver. During a transfer, one handler requests the receiver to create a writer, and the receiver then asks a handler in the same pool to create the writer. However, in the case of GPUs, there is only one handler, leading to a deadlock.

xorbits1

The solution is to move part of the receiver's logic to the handler. When opening a writer, the process checks whether the current handler and receiver are in the same pool. If they are, the handler opens the writer and passes it to the receiver. Otherwise, the receiver requests another handler in the same pool to open the writer. The updated process is shown below.

Xorbits2

Copy link
Collaborator

@hucorz hucorz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luweizheng luweizheng merged commit 573ee79 into xorbitsai:main Nov 16, 2024
38 of 40 checks passed
@luweizheng luweizheng deleted the feat/cudf branch December 16, 2024 02:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request gpu
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants