Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Some benchmarking for #14238 (comment)
I've done some research to verify how things work and here are my most important findings:
MatrixParams Order field
The
order
field in the MatrixParams class is set by default to "column major" https://github.com/Samsung/ONE/blob/master/compute/cker/include/cker/Types.h#L442. It is then set to different values in particular op contexts, for example here in the FullyConnected op https://github.com/Samsung/ONE/blob/master/compute/cker/include/cker/operation/FullyConnected.h#L80However when those objects are passed to the optimized Gemm implementation the storage order field is ignored and the implementation always uses the following Eigen configuration https://github.com/Samsung/ONE/blob/master/compute/cker/include/cker/eigen/eigen_gemm_eigen.h#L62-L64 - lhs is parametrized by the "row major" order while rhs and the output use "column major".
I think the MatrixParams order field could be left as default when using this class with the Gemm optimized kernel.
Performance considerations
I've done some microbenchmarking to see if the storage order changes anything performance-wise and this is in fact how I discovered that the Eigen-based implementation of Gemm ignores this setting.
I've also found the information that the storage order of Eigen matrices does not affect the performance when the Matrix acts like a "view" over the existing data in a C++ array/container. C++ stores the data in row-major format by default and I thought that traversing it in column-major order would affect the performance but apparently with Eigen it does not (I did not dwelve into more details or the implementation to figure out why).
I've also experimented with pure-Eigen matmuls to see how they perform and the results seem to be the same no matter if you multiply in row-major, column-major. However when you perform row-major x column-major matmul operation the peformance is about 2x worse than doing it in a uniform way (both matrices with the same storage order)