[Example] Backward function of RGMS kernel #77

yzh119 · 2022-12-10T09:08:11Z

The forward function of the RGMS kernel is (relation related information are ignored for simplicity):

$$ Y = AXW $$

we already have its implementation written in SparseTIR using composable formats and tensor cores.

The backward function of the RGMS kernel needs to compute both the gradient of $X$ and $W$ :
$$\nabla (XW) = A^T \nabla Y$$
$$\nabla X = \nabla (XW) W^T $$
$$\nabla W = X^T \nabla (XW) $$

The three formulas could be computed inside the same kernel, and $\nabla (XW)$ should be stored in shared memory. The same optimizations (composable formats + tensorization) could be applied to backward kernel as well.

yzh119 self-assigned this Dec 11, 2022

yzh119 added this to SparseTIR Dec 11, 2022

yzh119 added the priority-high label Dec 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Example] Backward function of RGMS kernel #77

[Example] Backward function of RGMS kernel #77

yzh119 commented Dec 10, 2022 •

edited

Loading

[Example] Backward function of RGMS kernel #77

[Example] Backward function of RGMS kernel #77

Comments

yzh119 commented Dec 10, 2022 • edited Loading

yzh119 commented Dec 10, 2022 •

edited

Loading