Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] | GPUAI-3720 - Integrate Universal GEMM into Grouped GEMM - Pt 1 #1800

Draft
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

rtmadduri
Copy link
Contributor

Proposed changes

This PR integrates Universal GEMM into Device Grouped Gemm.
Specifically, we replace:
The GridwiseGemm_bk0mk1_bk0nk1_mn_xdlops_v2r4r2 in device_grouped_gemm_xdl_splitk_cshuffle.hpp with GridwiseGemm_xdl_cshuffle_v3

We make corresponding changes to the struct Argument and struct Invoke

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

@rtmadduri rtmadduri force-pushed the rimadduri/universal_gemm_into_grouped_gemm branch from 469a083 to 15a21fc Compare January 7, 2025 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant