Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add device bridge support for HIP and CUDA. #3

Merged
merged 7 commits into from
Apr 25, 2024
Merged

Conversation

stellaraccident
Copy link
Collaborator

@stellaraccident stellaraccident commented Apr 23, 2024

  • Adds a custom IREE "_test_add" op which exercises actual code generation.
  • Reworks the device management layer to:
    • Initialize our Device wrapper to maintain the correspondence between IREE HalDevice and PyTorch device.
    • Annotated Device with dlpack_device_type_code.
    • Detects whether the Torch device is for real-CUDA or AMDGPU/HIP CUDA and interfaces to the outside world correctly based on this.
    • Auto-detects the AMDGPU chip and the CUDA SM version and arranges for the JIT compiler to use that.
    • Dynamically enables custom kernel registration for CUDA if it is available.

Limitations (for now):

  • IREE's dlpack interop only supports contiguous tensors. Lifting this requires further interfacing with the compiler to specialize on different strided layouts.
  • Device synchronization is using dlpack's implicit default-stream synchronization. Since we are currently still JIT compiling in synchronous mode, we are skating by on this. Some additional hooks and APIs are needed to properly place stream events to do it for real.
  • ROCM builds of torch seem to not have an easy way to map a device back to its UUID, and I didn't have a CUDA device handy to test the ways this is done on CUDA, so both are just relying on enumeration order on multi device systems. I kept the correspondence to a single place in the code and think I know how to fix this, but it will require some poking.

Depends on an IREE bump that includes: iree-org/iree#17131 (but can go in without it since that only unblocks HIP/CUDA, which were not supported yet anyway)

stellaraccident added a commit to nod-ai/shark-ai that referenced this pull request Apr 25, 2024
* Threads explicit device through models.
* Implements functional InferenceTensor, Theta and Dataset transformations and uses it to implement `to(device=)`.
* Adds `--device foo` to example runner.
* With iree-org/iree-turbine#3 and supporting patches, this allows custom ops and kernels to be transparently be used on CUDA/ROCM devices (instead of just CPU).
@stellaraccident stellaraccident merged commit cafc812 into main Apr 25, 2024
3 checks passed
@stellaraccident stellaraccident deleted the eager_hip_cuda branch April 25, 2024 02:36
stellaraccident added a commit to nod-ai/shark-ai that referenced this pull request Apr 25, 2024
* Threads explicit device through models.
* Implements functional InferenceTensor, Theta and Dataset
transformations and uses it to implement `to(device=)`.
* Adds `--device foo` to example runner.
* With iree-org/iree-turbine#3 and supporting
patches, this allows custom ops and kernels to transparently be used on
CUDA/ROCM devices (instead of just CPU).
harsh-nod added a commit to harsh-nod/iree-turbine that referenced this pull request Oct 30, 2024
Signed-off-by: Harsh Menon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants