-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation updates #2056
base: develop
Are you sure you want to change the base?
Documentation updates #2056
Changes from 15 commits
d686b9d
0a1b7c8
67aa8aa
a825fb8
549a6a8
99afe42
d4234ba
3bdd7e7
d81d105
c8f8c60
c4c08b1
727233d
b74c7d5
1b010cf
153c937
a468228
be6f971
b73c6d7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -4,79 +4,85 @@ | |||||||||||||||||||||||||
|
||||||||||||||||||||||||||
.. _solution-selection-catalogs: | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
*************************** | ||||||||||||||||||||||||||
Solution selection catalogs | ||||||||||||||||||||||||||
*************************** | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
Tensile provides a mechanism by which only a subset of the code object files produced during a build are loaded at runtime. | ||||||||||||||||||||||||||
This is necessary to avoid the overhead associated with loading code object files including initialization time and the | ||||||||||||||||||||||||||
memory footprint of the loaded code object files. However, this introduces the problem of knowing which code object file to load. | ||||||||||||||||||||||||||
Solution selection is the process by which the **TensileHost** library determines what kernel is preferred and, in turn, | ||||||||||||||||||||||||||
what code object file contains the selected kernel. This process uses a hierarchical structure | ||||||||||||||||||||||||||
to efficiently search for kernels based on hardware, problem size, and transpose, among others. | ||||||||||||||||||||||||||
This is the role of the **solution selection catalog** [1]_---a serialized file that uses a hierarchical | ||||||||||||||||||||||||||
schema to organize kernel metadata for efficient lookup at runtime. | ||||||||||||||||||||||||||
************************************** | ||||||||||||||||||||||||||
Tensile solution selection catalogs | ||||||||||||||||||||||||||
************************************** | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
To avoid the overhead associated with loading code object files including initialization time and the memory footprint of the loaded code object files, | ||||||||||||||||||||||||||
Tensile provides a mechanism to load only a subset of the code object files produced during a build, at runtime. | ||||||||||||||||||||||||||
To achieve this, it must be determined which code object file to load. | ||||||||||||||||||||||||||
To determine the preferred kernel and the code object file containing the selected kernel, | ||||||||||||||||||||||||||
the ``TensileHost`` library utilizes a process named `Solution selection`. | ||||||||||||||||||||||||||
This process uses a hierarchical structure to efficiently search for kernels based on hardware, problem size, and transpose, among others. | ||||||||||||||||||||||||||
Comment on lines
+11
to
+16
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The search for kernel is based not only on hardware, problem size and transpose ,hence I think we should mention "among others". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I prefer the former of the two options here, and agree with @SwRaw that, since there are other parameters that are involved, we keep "among others". |
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
For efficient lookup at runtime, the kernel metadata must be organized in a hierarchical schema in a serialized file named `solution selection catalog` [1]_. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
.. note:: | ||||||||||||||||||||||||||
Throughout this document we will refer to catalog files with the .yaml extension. In practice, | ||||||||||||||||||||||||||
This document refers to the catalog files with .yaml extension. In practice, | ||||||||||||||||||||||||||
SwRaw marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||||||
solution selection catalogs are usually serialized with `MessagePack <https://msgpack.org/>`_, which uses the .dat extension. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
Catalog hierarchy | ||||||||||||||||||||||||||
================= | ||||||||||||||||||||||||||
Solution selection catalog hierarchy | ||||||||||||||||||||||||||
===================================== | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
.. figure:: ../../assets/msl.svg | ||||||||||||||||||||||||||
:alt: Master Solution Library hierarchy | ||||||||||||||||||||||||||
:align: center | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
Solution selection catalog heirarchy for gfx900 and gfx90a | ||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The solution selection catalog is general irrespective of architecture. As an example, gfx900 and 90a are used in the image. Hence, I don't think its a great idea to add the architectures to the heading. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is just for the figure caption, and since this figure is demonstrating the hierarchy for these two architectures, it seems appropriate to keep "gfx900" and "gfx90a" in the caption. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, we already have "gfx900" and gfx90a in the image caption. I too prefer not to mention it in the heading. |
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
.. note:: | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
In the figure, the levels are numbered from top to bottom. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
**Level 1: Hardware** | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
At runtime, only kernels compatible with the device can execute. As such, the top level of the hierarchy involves hardware comparisons using GFX architecture. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
**Level 2: Operation** | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
This layer is a mapping from a GEMM transpose setting, defined using | ||||||||||||||||||||||||||
Einstein tensor notation (e.g. *Contraction_l_Alik_Bjlk_Cijk_Dijk*) to a list of problem properties. | ||||||||||||||||||||||||||
This layer is a mapping between a GEMM transpose setting defined using | ||||||||||||||||||||||||||
Einstein tensor notation (such as *Contraction_l_Alik_Bjlk_Cijk_Dijk*) and a list of problem properties. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
**Level 3: Problem** | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
This layer matches against specific problem properties such as input and output types, and features like high precision accumulation and stochastic rounding. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
**Level 4: Exact solution** | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
Finally, exact solutions contain fine-grained details about each solution that can be used during solution selection to locate the best kernel and to assert | ||||||||||||||||||||||||||
that the requested problem predicates are satisfied. Each kernel will have an index and a performance ranking. During solution selection, the highest ranked | ||||||||||||||||||||||||||
kernel from this pool will be selected. | ||||||||||||||||||||||||||
Finally, exact solutions contain fine-grained details about each solution that can be used during solution selection to locate the best kernel and ensure the fulfillment of the requested problem predicates. Each kernel has an index and a performance ranking. During solution selection, the highest-ranked | ||||||||||||||||||||||||||
kernel from this pool is selected. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
Build modes | ||||||||||||||||||||||||||
=========== | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
Tensile comes equipped with multiple build modes, which affect the way solution selection catalogs are generated. | ||||||||||||||||||||||||||
Tensile is equipped with multiple build modes that affect how solution selection catalogs are generated. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
Mode 1: Lazy library loading | ||||||||||||||||||||||||||
---------------------------- | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
If ``--lazy-library-loading`` is enabled, then a "parent" catalog is generated for each architecture, named | ||||||||||||||||||||||||||
If ``--lazy-library-loading`` is enabled, then the following `parent` catalog is generated for each architecture: | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
.. centered:: TensileLibrary_lazy_<gfx>.yaml | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
This file contains a | ||||||||||||||||||||||||||
reference to each of it's "child" catalogs, but doesn't have details about the exact solutions. These settings are instead | ||||||||||||||||||||||||||
held in the "child" catalogs, which use the naming convention | ||||||||||||||||||||||||||
This parent catalog contains a reference to each of its `child` catalogs without the details about the exact solutions. | ||||||||||||||||||||||||||
These settings are present in the `child` catalogs, which use the following naming convention: | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
.. centered:: TensileLibrary_Type_<precision>_<problem type>_<gfx>.yaml | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
Here, *precision* is the data type, *problem type* is the GEMM type, including transpose and accumulate settings, and *gfx* is the hardware GFX archiecture. | ||||||||||||||||||||||||||
In the preceding naming convention: | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
- <precision>: Data type | ||||||||||||||||||||||||||
- <problem type>: GEMM type, including transpose and accumulate settings | ||||||||||||||||||||||||||
- <gfx>: Hardware GFX architecture | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
For example, *TensileLibrary_Type_HH_Contraction_l_Alik_Bjlk_Cijk_Dijk_<gfx>.yaml* identifies a code object library for half precision | ||||||||||||||||||||||||||
For example, *TensileLibrary_Type_HH_Contraction_l_Alik_Bjlk_Cijk_Dijk_<gfx>.yaml* identifies a code object library for half-precision | ||||||||||||||||||||||||||
contractions on two transpose matrices, otherwise known as HGEMM TT. | ||||||||||||||||||||||||||
In this way, the child catalogs contain the solution metadata, while the parent catalog is responsible for organizing the child catalogs | ||||||||||||||||||||||||||
In this way, the child catalogs contain the solution metadata, while the parent catalog organizes the child catalogs | ||||||||||||||||||||||||||
by hardware, problem type, transpose, precision, and other predicates. | ||||||||||||||||||||||||||
This has the benefit of reducing the memory footprint of the calling application, as code object libraries are compiled separately and loaded only when required. | ||||||||||||||||||||||||||
This reduces the memory footprint of the calling application, as code object libraries are compiled separately and loaded only when required. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
**Example: Build outputs** | ||||||||||||||||||||||||||
**Example: Build output** | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
.. code-block:: bash | ||||||||||||||||||||||||||
:caption: Lazy library loading build outputs for *DD_Contraction_l_Alik_Bjlk_Cijk_Dijk* | ||||||||||||||||||||||||||
|
@@ -87,8 +93,8 @@ This has the benefit of reducing the memory footprint of the calling application | |||||||||||||||||||||||||
├── Kernels.so-000-gfx900.hsaco | ||||||||||||||||||||||||||
├── Kernels.so-000-gfx906.hsaco | ||||||||||||||||||||||||||
├── TensileLibrary_lazy_gfx1030.yaml # [A] | ||||||||||||||||||||||||||
├── TensileLibrary_lazy_gfx900.yaml | ||||||||||||||||||||||||||
├── TensileLibrary_lazy_gfx906.yaml | ||||||||||||||||||||||||||
├── TensileLibrary_lazy_gfx900.yaml | ||||||||||||||||||||||||||
├── TensileLibrary_lazy_gfx906.yaml | ||||||||||||||||||||||||||
├... | ||||||||||||||||||||||||||
├── TensileLibrary_Type_..._fallback_gfx1030.hsaco | ||||||||||||||||||||||||||
├── TensileLibrary_Type_..._fallback_gfx900.hsaco | ||||||||||||||||||||||||||
|
@@ -100,16 +106,20 @@ This has the benefit of reducing the memory footprint of the calling application | |||||||||||||||||||||||||
├── TensileLibrary_Type_..._gfx906.co | ||||||||||||||||||||||||||
├── TensileLibrary_Type_..._gfx906.yaml # [D] | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
Line **[A]** shows the parent catalog for gfx1030, the first of the three parent catalogs generated. | ||||||||||||||||||||||||||
Line **[B]** shows a fallback child catalog, which reference each of the archiecture specific fallback kernels | ||||||||||||||||||||||||||
in the associated .hsaco files. | ||||||||||||||||||||||||||
This means that at least some of the parameter/problem type combinations for *DD_Contraction_l_Alik_Bjlk_Cijk_Dijk* | ||||||||||||||||||||||||||
haven't been explicitly tuned for these architectures. | ||||||||||||||||||||||||||
Note that the matching .hsaco files (above **[B]**) are code object libraries for HIP source kernels. | ||||||||||||||||||||||||||
These files are referenced by the fallback catalog. | ||||||||||||||||||||||||||
Line **[C]** shows a child catalog for gfx900 that references both HIP source and assembly source kernels, found in the associated .hsaco and .co files, respectively. | ||||||||||||||||||||||||||
Line **[D]** shows a child catalog for gfx906, similar to the gfx900 catalog. However, notice that there is only one associated | ||||||||||||||||||||||||||
.co file. This means that there are only assembly source kernels in this catalog. | ||||||||||||||||||||||||||
Note that the lines in the build output are marked as A, B, C, and D for reference. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
- Line [A]: Shows the parent catalog for gfx1030, the first of the three parent catalogs generated. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
- Line [B]: Shows a fallback child catalog that references each of the architecture-specific fallback kernels in the associated ``.hsaco`` files. | ||||||||||||||||||||||||||
This implies that at least some of the parameter or problem type combinations for *DD_Contraction_l_Alik_Bjlk_Cijk_Dijk* | ||||||||||||||||||||||||||
haven't been explicitly tuned for these architectures. | ||||||||||||||||||||||||||
Note that the matching ``.hsaco`` files (above line [B]) are code object libraries for HIP source kernels. | ||||||||||||||||||||||||||
These files are referenced by the fallback catalog. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
- Line [C]: Shows a child catalog for gfx900 that references both HIP source and assembly source kernels found in the associated ``.hsaco`` and ``.co`` files, respectively. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
- Line [D]: Shows a child catalog for gfx906, similar to the gfx900 catalog. However, notice that there is only one associated | ||||||||||||||||||||||||||
``.co`` file. This implies that the catalog contains only assembly source kernels. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
**Example: Parent solution selection catalog** | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
|
@@ -145,28 +155,30 @@ Line **[D]** shows a child catalog for gfx906, similar to the gfx900 catalog. Ho | |||||||||||||||||||||||||
type: Hardware # [_A] | ||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bstefanuk Should I remove underscore from _A, _B ..etc.? The reference given below has no underscore. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I originally intended for these to identify open and close positions for a range of lines, but I'm not attached to this. Feel free to remove and update to your liking. |
||||||||||||||||||||||||||
solutions: [] | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
Line **[A]** shows the top level of the parent catalog, which contains a single row for each hardware architecture. | ||||||||||||||||||||||||||
Line **[B]** shows the problem map for the operation *Contraction_l_Alik_Bjlk_Cijk_Dijk*. | ||||||||||||||||||||||||||
Line **[C]** shows the problem type and predicates used to match against exact solutions contained in the child catalogs. | ||||||||||||||||||||||||||
Note that the lines in the parent catalog are marked as A, B, and C for reference. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
- Line [A]: Shows the top level of the parent catalog, which contains a single row for each hardware architecture. | ||||||||||||||||||||||||||
- Line [B]: Shows the problem map for the operation *Contraction_l_Alik_Bjlk_Cijk_Dijk*. | ||||||||||||||||||||||||||
- Line [C]: Shows the problem type and predicates used to match against exact solutions present in the child catalogs. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
Mode 2: Merge files | ||||||||||||||||||||||||||
------------------- | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
.. warning:: | ||||||||||||||||||||||||||
This feature is not recommended and is in the process of being deprecated. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
When ``--merge-files`` is enabled, one solution catalog is generated for each architecture, named | ||||||||||||||||||||||||||
This feature is not recommended as it is on the verge of deprecation. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
When ``--merge-files`` is enabled, one solution catalog is generated for each architecture, named: | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
.. centered:: TensileLibrary_<gfx>.yaml | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
The catalog contains information about supported GEMM types and | ||||||||||||||||||||||||||
solution metadata that is used to locate the optimal kernel for a requested GEMM. This pattern | ||||||||||||||||||||||||||
has the drawback that all code object libraries are loaded eagerly, | ||||||||||||||||||||||||||
thereby increasing both the initialization time and memory footprint of the calling application. | ||||||||||||||||||||||||||
The catalog contains information about supported GEMM types and | ||||||||||||||||||||||||||
solution metadata that is used to locate the optimal kernel for a requested GEMM. Note that this pattern increases both the initialization time and memory footprint of the calling application | ||||||||||||||||||||||||||
as all code object libraries are loaded eagerly. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
**Example** | ||||||||||||||||||||||||||
**Example: Build output** | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
Say you're building libraries for gfx908 and gfx90a with ``--merge-files``. The build output directory would look like this | ||||||||||||||||||||||||||
Here is the build output directory when building libraries for gfx908 and gfx90a with ``--merge-files``: | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
.. code-block:: bash | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
|
@@ -186,4 +198,4 @@ Say you're building libraries for gfx908 and gfx90a with ``--merge-files``. The | |||||||||||||||||||||||||
|
||||||||||||||||||||||||||
-------------------- | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
.. [1] Previously these files were called *master solution libraries* because they contain two top level keys, "solutions" and "library". The term *solution selection catalog* was later adopted to clarify the purpose of this file within the larger context of the Tensile C++ API. | ||||||||||||||||||||||||||
.. [1] Previously these files were named *master solution libraries* because they consisted of two top-level keys, "solutions" and "library". The term *solution selection catalog* was later adopted to clarify the purpose of this file within the larger context of the Tensile C++ API. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,7 +5,7 @@ | |
.. _installation: | ||
|
||
******************************************************************** | ||
Installation | ||
Tensile installation | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We are still wanting to redundantly add "Tensile" in front of all our headings? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @lpaoletti Please confirm. |
||
******************************************************************** | ||
|
||
Install ROCm | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. remove "only" above There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have replaced it with: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To me, the following wording is clear,
Developers will understand that the "current shell session" won't persist. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are two headings in a row. Either remove this heading. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Review for "we" in this document There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There is only one heading. The first one you see in red is a removed line. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I found "we" in this statement: "For demonstration purposes, we use the sample tuning file available in Tensile/Configs/rocblas_sgemm_example.yaml" "For demonstration purposes, the sample tuning file available in Tensile/Configs/rocblas_sgemm_example.yaml is used" PS: I had read in google style guide, that we can use "we" in such cases. Let me know if I should refrain from "we". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No opinion on this. |
||
|
@@ -15,8 +15,8 @@ To begin, install ROCm for your platform. For installation instructions, refer t | |
|
||
.. tip:: | ||
|
||
If using Bash, we recommend you to set ``PATH=/opt/rocm/bin/:$PATH`` in your ``~/.bashrc`` and refresh your shell using ``source ~/.bashrc``. | ||
Alternatively, export the path for your current shell session only, using ``export PATH=/opt/rocm/bin/:$PATH``. | ||
If using Bash, set ``PATH=/opt/rocm/bin/:$PATH`` in your ``~/.bashrc`` and refresh your shell using ``source ~/.bashrc``. | ||
Alternatively, export the path exclusively for your current shell session, using ``export PATH=/opt/rocm/bin/:$PATH``. | ||
SwRaw marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Install OS dependencies | ||
========================= | ||
|
@@ -83,7 +83,7 @@ Running benchmark | |
|
||
To run a benchmark, pass a tuning config to the ``Tensile`` program located in ``Tensile/bin``. | ||
|
||
For demonstration purposes, we use the sample tuning file available in ``Tensile/Configs/rocblas_sgemm_example.yaml``. | ||
For demonstration purposes, the sample tuning file available in ``Tensile/Configs/rocblas_sgemm_example.yaml`` is used. | ||
The sample tuning file allows you to specify the target architecture for which the benchmark will generate a library. | ||
To find your device architecture, run: | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference section doesn't align:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done