Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation updates #2056

Open
wants to merge 18 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions docs/sphinx/_toc.yml.in
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ subtrees:

- caption: Conceptual
entries:
- file: src/conceptual/introduction
- file: src/conceptual/solution-selection-catalogs
title: Solution selection catalogs

- caption: Reference
entries:
Expand All @@ -25,8 +25,9 @@ subtrees:
- file: src/api-reference/embedded-data
- file: src/api-reference/tensile-create-library-api
- file: src/api-reference/utilities
- file: src/reference/environment-variables
- file: src/reference/nomenclature
- file: src/reference/environment-variables
title: Environment variables

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference section doesn't align:
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- file: src/reference/glossary

- caption: Contribution
entries:
Expand Down
15 changes: 0 additions & 15 deletions docs/src/conceptual/introduction.rst

This file was deleted.

120 changes: 66 additions & 54 deletions docs/src/conceptual/solution-selection-catalogs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,79 +4,85 @@

.. _solution-selection-catalogs:

***************************
Solution selection catalogs
***************************

Tensile provides a mechanism by which only a subset of the code object files produced during a build are loaded at runtime.
This is necessary to avoid the overhead associated with loading code object files including initialization time and the
memory footprint of the loaded code object files. However, this introduces the problem of knowing which code object file to load.
Solution selection is the process by which the **TensileHost** library determines what kernel is preferred and, in turn,
what code object file contains the selected kernel. This process uses a hierarchical structure
to efficiently search for kernels based on hardware, problem size, and transpose, among others.
This is the role of the **solution selection catalog** [1]_---a serialized file that uses a hierarchical
schema to organize kernel metadata for efficient lookup at runtime.
**************************************
Tensile solution selection catalogs
**************************************

To avoid the overhead associated with loading code object files including initialization time and the memory footprint of the loaded code object files,
Tensile provides a mechanism to load only a subset of the code object files produced during a build, at runtime.
To achieve this, it must be determined which code object file to load.
To determine the preferred kernel and the code object file containing the selected kernel,
the ``TensileHost`` library utilizes a process named `Solution selection`.
This process uses a hierarchical structure to efficiently search for kernels based on hardware, problem size, and transpose, among others.
Comment on lines +11 to +16

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To avoid the overhead associated with loading code object files including initialization time and the memory footprint of the loaded code object files,
Tensile provides a mechanism to load only a subset of the code object files produced during a build, at runtime.
To achieve this, it must be determined which code object file to load.
To determine the preferred kernel and the code object file containing the selected kernel,
the ``TensileHost`` library utilizes a process named `Solution selection`.
This process uses a hierarchical structure to efficiently search for kernels based on hardware, problem size, and transpose, among others.
To avoid the overhead associated with loading code object files, including initialization time and the memory footprint of the loaded code object files,
Tensile provides a mechanism to load only a subset of the code object files produced during a build at runtime.
To achieve this, it must be determined which code object file to load.
To determine the preferred kernel and the code object file containing the selected kernel,
the ``TensileHost`` library utilizes `Solution selection` process.
This process uses a hierarchical structure to efficiently search for kernels based on hardware, problem size, and transpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The search for kernel is based not only on hardware, problem size and transpose ,hence I think we should mention "among others".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the former of the two options here, and agree with @SwRaw that, since there are other parameters that are involved, we keep "among others".


For efficient lookup at runtime, the kernel metadata must be organized in a hierarchical schema in a serialized file named `solution selection catalog` [1]_.

.. note::
Throughout this document we will refer to catalog files with the .yaml extension. In practice,
This document refers to the catalog files with .yaml extension. In practice,
SwRaw marked this conversation as resolved.
Show resolved Hide resolved
solution selection catalogs are usually serialized with `MessagePack <https://msgpack.org/>`_, which uses the .dat extension.

Catalog hierarchy
=================
Solution selection catalog hierarchy
=====================================

.. figure:: ../../assets/msl.svg
:alt: Master Solution Library hierarchy
:align: center

Solution selection catalog heirarchy for gfx900 and gfx90a

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Solution selection catalog heirarchy for gfx900 and gfx90a
Solution selection catalog hierarchy for gfx900 and gfx90a

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution selection catalog is general irrespective of architecture. As an example, gfx900 and 90a are used in the image. Hence, I don't think its a great idea to add the architectures to the heading.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just for the figure caption, and since this figure is demonstrating the hierarchy for these two architectures, it seems appropriate to keep "gfx900" and "gfx90a" in the caption.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we already have "gfx900" and gfx90a in the image caption. I too prefer not to mention it in the heading.


.. note::

In the figure, the levels are numbered from top to bottom.

**Level 1: Hardware**

At runtime, only kernels compatible with the device can execute. As such, the top level of the hierarchy involves hardware comparisons using GFX architecture.

**Level 2: Operation**

This layer is a mapping from a GEMM transpose setting, defined using
Einstein tensor notation (e.g. *Contraction_l_Alik_Bjlk_Cijk_Dijk*) to a list of problem properties.
This layer is a mapping between a GEMM transpose setting defined using
Einstein tensor notation (such as *Contraction_l_Alik_Bjlk_Cijk_Dijk*) and a list of problem properties.

**Level 3: Problem**

This layer matches against specific problem properties such as input and output types, and features like high precision accumulation and stochastic rounding.

**Level 4: Exact solution**

Finally, exact solutions contain fine-grained details about each solution that can be used during solution selection to locate the best kernel and to assert
that the requested problem predicates are satisfied. Each kernel will have an index and a performance ranking. During solution selection, the highest ranked
kernel from this pool will be selected.
Finally, exact solutions contain fine-grained details about each solution that can be used during solution selection to locate the best kernel and ensure the fulfillment of the requested problem predicates. Each kernel has an index and a performance ranking. During solution selection, the highest-ranked
kernel from this pool is selected.


Build modes
===========

Tensile comes equipped with multiple build modes, which affect the way solution selection catalogs are generated.
Tensile is equipped with multiple build modes that affect how solution selection catalogs are generated.

Mode 1: Lazy library loading
----------------------------

If ``--lazy-library-loading`` is enabled, then a "parent" catalog is generated for each architecture, named
If ``--lazy-library-loading`` is enabled, then the following `parent` catalog is generated for each architecture:

.. centered:: TensileLibrary_lazy_<gfx>.yaml

This file contains a
reference to each of it's "child" catalogs, but doesn't have details about the exact solutions. These settings are instead
held in the "child" catalogs, which use the naming convention
This parent catalog contains a reference to each of its `child` catalogs without the details about the exact solutions.
These settings are present in the `child` catalogs, which use the following naming convention:

.. centered:: TensileLibrary_Type_<precision>_<problem type>_<gfx>.yaml

Here, *precision* is the data type, *problem type* is the GEMM type, including transpose and accumulate settings, and *gfx* is the hardware GFX archiecture.
In the preceding naming convention:

- <precision>: Data type
- <problem type>: GEMM type, including transpose and accumulate settings
- <gfx>: Hardware GFX architecture

For example, *TensileLibrary_Type_HH_Contraction_l_Alik_Bjlk_Cijk_Dijk_<gfx>.yaml* identifies a code object library for half precision
For example, *TensileLibrary_Type_HH_Contraction_l_Alik_Bjlk_Cijk_Dijk_<gfx>.yaml* identifies a code object library for half-precision
contractions on two transpose matrices, otherwise known as HGEMM TT.
In this way, the child catalogs contain the solution metadata, while the parent catalog is responsible for organizing the child catalogs
In this way, the child catalogs contain the solution metadata, while the parent catalog organizes the child catalogs
by hardware, problem type, transpose, precision, and other predicates.
This has the benefit of reducing the memory footprint of the calling application, as code object libraries are compiled separately and loaded only when required.
This reduces the memory footprint of the calling application, as code object libraries are compiled separately and loaded only when required.

**Example: Build outputs**
**Example: Build output**

.. code-block:: bash
:caption: Lazy library loading build outputs for *DD_Contraction_l_Alik_Bjlk_Cijk_Dijk*
Expand All @@ -87,8 +93,8 @@ This has the benefit of reducing the memory footprint of the calling application
├── Kernels.so-000-gfx900.hsaco
├── Kernels.so-000-gfx906.hsaco
├── TensileLibrary_lazy_gfx1030.yaml # [A]
├── TensileLibrary_lazy_gfx900.yaml
├── TensileLibrary_lazy_gfx906.yaml
├── TensileLibrary_lazy_gfx900.yaml
├── TensileLibrary_lazy_gfx906.yaml
├...
├── TensileLibrary_Type_..._fallback_gfx1030.hsaco
├── TensileLibrary_Type_..._fallback_gfx900.hsaco
Expand All @@ -100,16 +106,20 @@ This has the benefit of reducing the memory footprint of the calling application
├── TensileLibrary_Type_..._gfx906.co
├── TensileLibrary_Type_..._gfx906.yaml # [D]

Line **[A]** shows the parent catalog for gfx1030, the first of the three parent catalogs generated.
Line **[B]** shows a fallback child catalog, which reference each of the archiecture specific fallback kernels
in the associated .hsaco files.
This means that at least some of the parameter/problem type combinations for *DD_Contraction_l_Alik_Bjlk_Cijk_Dijk*
haven't been explicitly tuned for these architectures.
Note that the matching .hsaco files (above **[B]**) are code object libraries for HIP source kernels.
These files are referenced by the fallback catalog.
Line **[C]** shows a child catalog for gfx900 that references both HIP source and assembly source kernels, found in the associated .hsaco and .co files, respectively.
Line **[D]** shows a child catalog for gfx906, similar to the gfx900 catalog. However, notice that there is only one associated
.co file. This means that there are only assembly source kernels in this catalog.
Note that the lines in the build output are marked as A, B, C, and D for reference.

- Line [A]: Shows the parent catalog for gfx1030, the first of the three parent catalogs generated.

- Line [B]: Shows a fallback child catalog that references each of the architecture-specific fallback kernels in the associated ``.hsaco`` files.
This implies that at least some of the parameter or problem type combinations for *DD_Contraction_l_Alik_Bjlk_Cijk_Dijk*
haven't been explicitly tuned for these architectures.
Note that the matching ``.hsaco`` files (above line [B]) are code object libraries for HIP source kernels.
These files are referenced by the fallback catalog.

- Line [C]: Shows a child catalog for gfx900 that references both HIP source and assembly source kernels found in the associated ``.hsaco`` and ``.co`` files, respectively.

- Line [D]: Shows a child catalog for gfx906, similar to the gfx900 catalog. However, notice that there is only one associated
``.co`` file. This implies that the catalog contains only assembly source kernels.

**Example: Parent solution selection catalog**

Expand Down Expand Up @@ -145,28 +155,30 @@ Line **[D]** shows a child catalog for gfx906, similar to the gfx900 catalog. Ho
type: Hardware # [_A]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bstefanuk Should I remove underscore from _A, _B ..etc.? The reference given below has no underscore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally intended for these to identify open and close positions for a range of lines, but I'm not attached to this. Feel free to remove and update to your liking.

solutions: []

Line **[A]** shows the top level of the parent catalog, which contains a single row for each hardware architecture.
Line **[B]** shows the problem map for the operation *Contraction_l_Alik_Bjlk_Cijk_Dijk*.
Line **[C]** shows the problem type and predicates used to match against exact solutions contained in the child catalogs.
Note that the lines in the parent catalog are marked as A, B, and C for reference.

- Line [A]: Shows the top level of the parent catalog, which contains a single row for each hardware architecture.
- Line [B]: Shows the problem map for the operation *Contraction_l_Alik_Bjlk_Cijk_Dijk*.
- Line [C]: Shows the problem type and predicates used to match against exact solutions present in the child catalogs.

Mode 2: Merge files
-------------------

.. warning::
This feature is not recommended and is in the process of being deprecated.

When ``--merge-files`` is enabled, one solution catalog is generated for each architecture, named
This feature is not recommended as it is on the verge of deprecation.

When ``--merge-files`` is enabled, one solution catalog is generated for each architecture, named:

.. centered:: TensileLibrary_<gfx>.yaml

The catalog contains information about supported GEMM types and
solution metadata that is used to locate the optimal kernel for a requested GEMM. This pattern
has the drawback that all code object libraries are loaded eagerly,
thereby increasing both the initialization time and memory footprint of the calling application.
The catalog contains information about supported GEMM types and
solution metadata that is used to locate the optimal kernel for a requested GEMM. Note that this pattern increases both the initialization time and memory footprint of the calling application
as all code object libraries are loaded eagerly.

**Example**
**Example: Build output**

Say you're building libraries for gfx908 and gfx90a with ``--merge-files``. The build output directory would look like this
Here is the build output directory when building libraries for gfx908 and gfx90a with ``--merge-files``:

.. code-block:: bash

Expand All @@ -186,4 +198,4 @@ Say you're building libraries for gfx908 and gfx90a with ``--merge-files``. The

--------------------

.. [1] Previously these files were called *master solution libraries* because they contain two top level keys, "solutions" and "library". The term *solution selection catalog* was later adopted to clarify the purpose of this file within the larger context of the Tensile C++ API.
.. [1] Previously these files were named *master solution libraries* because they consisted of two top-level keys, "solutions" and "library". The term *solution selection catalog* was later adopted to clarify the purpose of this file within the larger context of the Tensile C++ API.
14 changes: 10 additions & 4 deletions docs/src/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,12 @@ Tensile documentation

Tensile is a tool for creating a benchmark-driven backend library for General Matrix-Matrix Multiplications (GEMMs), GEMM-like problems such as batched GEMM, N-dimensional tensor contractions, and anything else that multiplies two multidimensional objects together on an AMD GPU.

Tensile is written in Python for library and kernel generation and in C++ for client headers and library tests. It is a vital
project in the ROCm ecosystem, providing optimized kernels for downstream libraries such as :doc:`rocBLAS <rocblas:index>`.

The parts of Tensile that are written in Python consist of applications that are collectively responsible
for generating optimized kernels and library objects to access these kernels from client code.

The code is open source and hosted at https://github.com/ROCm/Tensile

.. grid:: 2
Expand All @@ -21,14 +27,14 @@ The code is open source and hosted at https://github.com/ROCm/Tensile

.. grid-item-card:: Conceptual

* :ref:`introduction`
* :ref:`solution-selection-catalogs`
* :ref:`Solution selection catalogs <solution-selection-catalogs>`

.. grid-item-card:: Reference

* :ref:`Environment variables <environment-variables>`
* :ref:`API reference <api-reference>`
* :ref:`CLI reference <cli-reference>`
* :ref:`API reference <api-reference>`
* :ref:`Environment variables <environment-variables>`
* :ref:`Glossary <glossary>`

.. grid-item-card:: Contribution

Expand Down
8 changes: 4 additions & 4 deletions docs/src/install/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
.. _installation:

********************************************************************
Installation
Tensile installation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are still wanting to redundantly add "Tensile" in front of all our headings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lpaoletti Please confirm.

********************************************************************

Install ROCm

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove "only" above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have replaced it with:
Alternatively, export the path exclusively for your current shell session, using export PATH=/opt/rocm/bin/:$PATH.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, the following wording is clear,

Alternatively, export the path for your current shell session using `export PATH=/opt/rocm/bin/:$PATH`

Developers will understand that the "current shell session" won't persist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two headings in a row. Either remove this heading.
Perhaps better to put an introduction on this page that explains what the purpose is.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review for "we" in this document

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two headings in a row. Either remove this heading. Perhaps better to put an introduction on this page that explains what the purpose is.

There is only one heading. The first one you see in red is a removed line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review for "we" in this document

I found "we" in this statement:

"For demonstration purposes, we use the sample tuning file available in Tensile/Configs/rocblas_sgemm_example.yaml"
Even though I somehow feel that this sounds appropriate, I have replaced it with the following (if we need to avoid "we" at all costs):

"For demonstration purposes, the sample tuning file available in Tensile/Configs/rocblas_sgemm_example.yaml is used"

PS: I had read in google style guide, that we can use "we" in such cases. Let me know if I should refrain from "we".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No opinion on this.

Expand All @@ -15,8 +15,8 @@ To begin, install ROCm for your platform. For installation instructions, refer t

.. tip::

If using Bash, we recommend you to set ``PATH=/opt/rocm/bin/:$PATH`` in your ``~/.bashrc`` and refresh your shell using ``source ~/.bashrc``.
Alternatively, export the path for your current shell session only, using ``export PATH=/opt/rocm/bin/:$PATH``.
If using Bash, set ``PATH=/opt/rocm/bin/:$PATH`` in your ``~/.bashrc`` and refresh your shell using ``source ~/.bashrc``.
Alternatively, export the path exclusively for your current shell session, using ``export PATH=/opt/rocm/bin/:$PATH``.
SwRaw marked this conversation as resolved.
Show resolved Hide resolved

Install OS dependencies
=========================
Expand Down Expand Up @@ -83,7 +83,7 @@ Running benchmark

To run a benchmark, pass a tuning config to the ``Tensile`` program located in ``Tensile/bin``.

For demonstration purposes, we use the sample tuning file available in ``Tensile/Configs/rocblas_sgemm_example.yaml``.
For demonstration purposes, the sample tuning file available in ``Tensile/Configs/rocblas_sgemm_example.yaml`` is used.
The sample tuning file allows you to specify the target architecture for which the benchmark will generate a library.
To find your device architecture, run:

Expand Down
Loading