Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cker] Storage order testing #14370

Closed
wants to merge 2 commits into from
Closed

[cker] Storage order testing #14370

wants to merge 2 commits into from

Conversation

tomdol
Copy link
Contributor

@tomdol tomdol commented Nov 26, 2024

Some benchmarking for #14238 (comment)

I've done some research to verify how things work and here are my most important findings:

MatrixParams Order field

The order field in the MatrixParams class is set by default to "column major" https://github.com/Samsung/ONE/blob/master/compute/cker/include/cker/Types.h#L442. It is then set to different values in particular op contexts, for example here in the FullyConnected op https://github.com/Samsung/ONE/blob/master/compute/cker/include/cker/operation/FullyConnected.h#L80

However when those objects are passed to the optimized Gemm implementation the storage order field is ignored and the implementation always uses the following Eigen configuration https://github.com/Samsung/ONE/blob/master/compute/cker/include/cker/eigen/eigen_gemm_eigen.h#L62-L64 - lhs is parametrized by the "row major" order while rhs and the output use "column major".

I think the MatrixParams order field could be left as default when using this class with the Gemm optimized kernel.

Performance considerations

I've done some microbenchmarking to see if the storage order changes anything performance-wise and this is in fact how I discovered that the Eigen-based implementation of Gemm ignores this setting.
I've also found the information that the storage order of Eigen matrices does not affect the performance when the Matrix acts like a "view" over the existing data in a C++ array/container. C++ stores the data in row-major format by default and I thought that traversing it in column-major order would affect the performance but apparently with Eigen it does not (I did not dwelve into more details or the implementation to figure out why).
I've also experimented with pure-Eigen matmuls to see how they perform and the results seem to be the same no matter if you multiply in row-major, column-major. However when you perform row-major x column-major matmul operation the peformance is about 2x worse than doing it in a uniform way (both matrices with the same storage order)

@tomdol tomdol added PR/NO TEST Tell CI to not run test PR/NO MERGE Please don't merge. I'm still working on this :) type/lab Doing some experiments! labels Nov 26, 2024
@tomdol
Copy link
Contributor Author

tomdol commented Nov 26, 2024

Benchmark results in release mode

--------------------------------------------------------------------
Benchmark                          Time             CPU   Iterations
--------------------------------------------------------------------
cker RowMajor/0                 5.13 ms         5.13 ms          135
cker ColMajor/1                 5.13 ms         5.13 ms          137
eigen ColMajor                 0.214 ns        0.214 ns   3274521968
eigen RowMajor                 0.214 ns        0.214 ns   3275491262
eigen RowMajor + ColMajor      0.427 ns        0.427 ns   1639070538

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR/NO MERGE Please don't merge. I'm still working on this :) PR/NO TEST Tell CI to not run test type/lab Doing some experiments!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant