Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vulkan: linux builds + small subgroup size fixes #11767

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

netrunnereve
Copy link
Collaborator

Vulkan requires either the SDK or additional packages to build on Linux, so let's release official binaries so people can easily try it out.

Meanwhile our mat mul shaders don't work with subgroup sizes smaller than 8. With this fix all tests are passing even with device->subgroup_size forced to 1.

@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning labels Feb 8, 2025
l_warptile = { 128, 128, 128, 16, device->subgroup_size * 2, 64, 2, tm_l, tn_l, tk_l, device->subgroup_size };
m_warptile = { 128, 64, 64, 16, device->subgroup_size, 32, 2, tm_m, tn_m, tk_m, device->subgroup_size };
s_warptile = { subgroup_size_16, 32, 32, 16, 32, 32, 2, tm_s, tn_s, tk_s, device->subgroup_size };
l_warptile = { 128, 128, 128, 16, subgroup_size_8 * 2, 64, 2, tm_l, tn_l, tk_l, subgroup_size_8 };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine the coopmat path doesn't handle faking the subgroup size, maybe add an assert to that effect? Coopmat implementations probably have at least 8 invocations per subgroup, so this seems fine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does our coopmat shader even work with a subgroup size of 8? We should probably find the actual limit and set up the assert based on that.

Honestly I don't know exactly why the regular mul_mat shaders break down with a subgroup size less than 8, but with the Vulkan backend becoming more and more popular I'd rather have it run slowly than fail mysteriously on those devices.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warptile parameters are not independent. There is probably a minimum there, coming from the hardcoded values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants