fix(library): Propagate upstream Marlin kernel fix #366

ahadnagy · 2025-01-06T22:19:18Z

What does this PR do?

Fixes #332

TLDR; There was a data race bug in the Marlin kernel. This fix basically adds a separate shared memory region for the final reduction tree. Unfortunately, this affects the minimum hardware requirements for the kernel, it won't work on GPUs with compute capability < 8.0.

Before submitting

Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you run all tests locally and make sure they pass.
Did you write any new necessary tests?

Increase shared mem. size Fix shared mem. size, re-activate test Remove debugging-related stuff

Fix build error

ahadnagy · 2025-01-07T12:20:42Z

@dacorvo It seems like it's not gonna work an A10s due to its lower shared memory size. Is that a hard requirement for the library?

dacorvo · 2025-01-07T13:30:59Z

@dacorvo It seems like it's not gonna work an A10s due to its lower shared memory size. Is that a hard requirement for the library?

Pretty much, yes, since one of the main use case for quantization is to be able to run bigger models on smaller devices.

ahadnagy · 2025-01-07T13:55:23Z

Okay, in that case it'll be necessary to reduce the tile size. I'll check what vllm does on this front, and if that works on A10s at all. IIRC, their CI runs on L40s.

ahadnagy requested a review from dacorvo as a code owner January 6, 2025 22:19

ahadnagy marked this pull request as draft January 6, 2025 22:19

ahadnagy marked this pull request as ready for review January 6, 2025 22:29

ahadnagy changed the title ~~fix(library): Propagate upstream Marlin kernel fix (WIP)~~ fix(library): Propagate upstream Marlin kernel fix Jan 6, 2025

ahadnagy force-pushed the marlin-kernel-fix branch from d2eba7f to 74298e3 Compare January 6, 2025 22:40

fix(library): Propagate upstream Marlin kernel fix

ecd85fc

Increase shared mem. size Fix shared mem. size, re-activate test Remove debugging-related stuff

ahadnagy force-pushed the marlin-kernel-fix branch from 74298e3 to ecd85fc Compare January 6, 2025 22:52

ahadnagy added 2 commits January 7, 2025 09:37

Apply ruff FMT

4493482

Move reduce shared memory to the back, query max. shmem size dynamically

ec2bc69

Fix build error

ahadnagy force-pushed the marlin-kernel-fix branch from ae10ca1 to ec2bc69 Compare January 7, 2025 12:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(library): Propagate upstream Marlin kernel fix #366

fix(library): Propagate upstream Marlin kernel fix #366

ahadnagy commented Jan 6, 2025 •

edited

Loading

ahadnagy commented Jan 7, 2025

dacorvo commented Jan 7, 2025

ahadnagy commented Jan 7, 2025

fix(library): Propagate upstream Marlin kernel fix #366

Are you sure you want to change the base?

fix(library): Propagate upstream Marlin kernel fix #366

Conversation

ahadnagy commented Jan 6, 2025 • edited Loading

What does this PR do?

Before submitting

ahadnagy commented Jan 7, 2025

dacorvo commented Jan 7, 2025

ahadnagy commented Jan 7, 2025

ahadnagy commented Jan 6, 2025 •

edited

Loading