[cker] Storage order testing #14370

tomdol · 2024-11-26T11:43:26Z

I've done some research to verify how things work and here are my most important findings:

MatrixParams Order field

The order field in the MatrixParams class is set by default to "column major" https://github.com/Samsung/ONE/blob/master/compute/cker/include/cker/Types.h#L442. It is then set to different values in particular op contexts, for example here in the FullyConnected op https://github.com/Samsung/ONE/blob/master/compute/cker/include/cker/operation/FullyConnected.h#L80

However when those objects are passed to the optimized Gemm implementation the storage order field is ignored and the implementation always uses the following Eigen configuration https://github.com/Samsung/ONE/blob/master/compute/cker/include/cker/eigen/eigen_gemm_eigen.h#L62-L64 - lhs is parametrized by the "row major" order while rhs and the output use "column major".

I think the MatrixParams order field could be left as default when using this class with the Gemm optimized kernel.

Performance considerations

I've done some microbenchmarking to see if the storage order changes anything performance-wise and this is in fact how I discovered that the Eigen-based implementation of Gemm ignores this setting.
I've also found the information that the storage order of Eigen matrices does not affect the performance when the Matrix acts like a "view" over the existing data in a C++ array/container. C++ stores the data in row-major format by default and I thought that traversing it in column-major order would affect the performance but apparently with Eigen it does not (I did not dwelve into more details or the implementation to figure out why).
I've also experimented with pure-Eigen matmuls to see how they perform and the results seem to be the same no matter if you multiply in row-major, column-major. However when you perform row-major x column-major matmul operation the peformance is about 2x worse than doing it in a uniform way (both matrices with the same storage order)

tomdol · 2024-11-26T11:56:53Z

Benchmark results in release mode

--------------------------------------------------------------------
Benchmark                          Time             CPU   Iterations
--------------------------------------------------------------------
cker RowMajor/0                 5.13 ms         5.13 ms          135
cker ColMajor/1                 5.13 ms         5.13 ms          137
eigen ColMajor                 0.214 ns        0.214 ns   3274521968
eigen RowMajor                 0.214 ns        0.214 ns   3275491262
eigen RowMajor + ColMajor      0.427 ns        0.427 ns   1639070538

Testing the storage orders

4df7e4a

tomdol added PR/NO TEST Tell CI to not run test PR/NO MERGE Please don't merge. I'm still working on this :) type/lab Doing some experiments! labels Nov 26, 2024

tomdol mentioned this pull request Nov 26, 2024

[compute/cker] Optimize BMM for X86 #14238

Closed

Corrections

08da09f

tomdol mentioned this pull request Nov 26, 2024

[compute/cker] Remove the storage order parametrization from BatchMatMul #14371

Merged

tomdol closed this Nov 26, 2024

tomdol deleted the bmm_tests branch January 9, 2025 11:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cker] Storage order testing #14370

[cker] Storage order testing #14370

tomdol commented Nov 26, 2024 •

edited

Loading

tomdol commented Nov 26, 2024

[cker] Storage order testing #14370

[cker] Storage order testing #14370

Conversation

tomdol commented Nov 26, 2024 • edited Loading

MatrixParams Order field

Performance considerations

tomdol commented Nov 26, 2024

tomdol commented Nov 26, 2024 •

edited

Loading