Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sphinx Documentation #121

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
36 changes: 36 additions & 0 deletions .github/workflows/sphinx-build-deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: Deploy Sphinx Documentation

on:
push:
branches:
- sphinx_docs # Trigger the workflow on pushes to the sphinx_docs branch

jobs:
build-and-deploy:
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v3
with:
ref: sphinx_docs # Ensure the sphinx_docs branch is checked out

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.x

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt

- name: Build Sphinx documentation
run: |
sphinx-build -b html docs/source docs/build/html

- name: Deploy to GitHub Pages
uses: peaceiris/actions-gh-pages@v3
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: docs/build/html
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,9 @@
**/__pycache__
venv/
.idea
arch-docs/
doc.md
img.png
docs/build
docs/autodocs
docs/source/apidocs
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ the clean and optimized source code for those examples in [`paper/`](paper).

## Getting started

Have a look at the [SLOTHY tutorial](tutorial/README.md) for a hands-on and example-based introduction to SLOTHY.
Have a look at the [SLOTHY tutorial](docs/source/tutorial/README.md) for a hands-on and example-based introduction to SLOTHY.

## Real world uses

Expand Down Expand Up @@ -119,7 +119,7 @@ INFO:aarch64_simple0_a55.slothy:Minimum number of stalls: 18

### Examples

The [SLOTHY Tutorial](tutorial/README.md) and the [examples](examples/naive) directory contain numerous exemplary
The [SLOTHY Tutorial](docs/source/tutorial/README.md) and the [examples](examples/naive) directory contain numerous exemplary
assembly snippets. To try them, use `python3 example.py --examples={YOUR_EXAMPLE}`. See `python3 example.py --help` for
the list of all available examples.

Expand Down
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ instructions, and scheduling/allocation optimizations are tightly controlled thr
constraints, the developer keeps close control over the final assembly, while being freed from tedious
micro-optimizations.

See also [FAQ](faq.md)
See also [FAQ](source/faq.md)

#### Architecture/Microarchitecture support

Expand Down
35 changes: 35 additions & 0 deletions docs/make.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.https://www.sphinx-doc.org/
exit /b 1
)

if "%1" == "" goto help

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%

:end
popd
11 changes: 11 additions & 0 deletions docs/source/_static/css/style.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
.wy-nav-content {
max-width: 65% !important;
}

/*
Fix for horizontal stacking weirdness in the RTD theme with Python properties:
https://github.com/readthedocs/sphinx_rtd_theme/issues/1301
*/
.py.property {
display: block !important;
}
59 changes: 59 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Configuration file for the Sphinx documentation builder.
#
# For the full list of built-in configuration values, see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

import os, sys

project = 'SLOTHY'
copyright = '2025, Hanno Becker, Amin Abdulrahman, Matthias Kannwischer, Justus Bergermann'
author = 'Hanno Becker, Amin Abdulrahman, Matthias Kannwischer, Justus Bergermann'

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

sys.path.insert(0, os.path.abspath("../../slothy"))

suppress_warnings = [
'misc.highlighting_failure',
'myst.xref_missing',
'autodoc2.dup_item'
]

extensions = [
'sphinx.ext.imgmath',
'sphinx_rtd_theme',
'myst_parser',
'autodoc2'
]

templates_path = ['_templates']
exclude_patterns = []

source_suffix = {
'.rst': 'restructuredtext',
'.md': 'markdown',
}
master_doc = 'index'


autodoc2_packages = [
"../../slothy",
]


# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

html_theme = "sphinx_rtd_theme"
html_static_path = ['_static']
html_logo = "../slothy_logo.png"
html_theme_options = { 'logo_only': True, }
html_css_files = [
'css/style.css',
]


22 changes: 8 additions & 14 deletions docs/faq.md → docs/source/faq.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,6 @@
---
layout: default
---
# Frequently asked questions

## Frequently asked questions

[back](index.md)

#### Is SLOTHY a peephole optimizer?
## Is SLOTHY a peephole optimizer?

No. SLOTHY is a _fixed-instruction_ super-optimizer: It keeps instructions and optimizes
register allocation, instruction scheduling, and software pipelining. It is the developer's or another tool's
Expand All @@ -18,32 +12,32 @@ You may want to use SLOTHY on performance-critical workloads for which precise c
is beneficial (e.g. because other code-generation techniques do not find ideal instruction sequences) or needed
(e.g. because some instructions or instruction patterns have to be avoided for security). -->

#### Is SLOTHY better than {name your favourite superoptimizer}?
## Is SLOTHY better than {name your favourite superoptimizer}?

Most likely, they serve different purposes. SLOTHY aims to do one thing well: Optimization _after_ instruction selection.
It is thus independent of and potentially combinable with superoptimizers operating at earlier stages of the code-generation process, such as [souper](https://github.com/google/souper) and [CryptOpt](https://github.com/0xADE1A1DE/CryptOpt).

#### Does SLOTHY support x86?
## Does SLOTHY support x86?

The core of SLOTHY is architecture- and microarchitecture-agnostic and can accommodate x86. As it stands, however,
there is no model of the x86 architecture. Feel free to build one!

#### Does SLOTHY support RISC-V?
## Does SLOTHY support RISC-V?

As for x86.

#### Is SLOTHY formally verified?
## Is SLOTHY formally verified?

No. Arguably, that wouldn't be a good use of time. The more relevant question is the following:

#### Is SLOTHY-generated code formally verified to be equivalent to the input code?
## Is SLOTHY-generated code formally verified to be equivalent to the input code?

Not yet. SLOTHY runs a self-check confirming that input and output have isomorphic data flow graphs,
but pitfalls remain, such as bad user configurations allowing SLOTHY to clobber a register that's not
meant to be reserved. More work is needed for formal verification of the equivalence of input
and output.

#### Why is my question not here?
## Why is my question not here?

Ping us! ([GitHub](https://github.com/slothy-optimizer/slothy/issues), or see [paper](https://eprint.iacr.org/2022/1303.pdf) for
contact information).
122 changes: 122 additions & 0 deletions docs/source/general.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# About SLOTHY

**SLOTHY** - **S**uper **L**azy **O**ptimization of **T**ricky **H**andwritten assembl**Y** - is an assembly-level superoptimizer
for:
1. Instruction scheduling
2. Register allocation
3. Software pipelining (= periodic loop interleaving)

SLOTHY is generic in the target architecture and microarchitecture. This repository provides instantiations for:
- Armv8.1-M+Helium: Cortex-M55, Cortex-M85
- AArch64: Cortex-A55, and experimentally Cortex-A72, Cortex-X/Neoverse-V, Apple M1 (Firestorm, Icestorm)

SLOTHY is discussed in [Fast and Clean: Auditable high-performance assembly via constraint solving](https://eprint.iacr.org/2022/1303).

## Goal

SLOTHY enables a development workflow where developers write 'clean' assembly by hand, emphasizing the logic of the computation, while SLOTHY automates microarchitecture-specific micro-optimizations. This accelerates development, keeps manually written code artifacts maintainable, and allows to split efforts for formal verification into the separate verification of the clean code and the micro-optimizations.

## How it works

SLOTHY is essentially a constraint solver frontend: It converts the input source into a data flow graph and
builds a constraint model capturing valid instruction schedulings, register renamings, and periodic loop
interleavings. The model is passed to an external constraint solver and, upon success,
a satisfying assignment converted back into the final code. Currently, SLOTHY uses
[Google OR-Tools](https://developers.google.com/optimization) as its constraint solver backend.

## Performance

As a rough rule of thumb, SLOTHY typically optimizes workloads of <50 instructions in seconds to minutes, workloads
up to 150 instructions in minutes to hours, while for larger kernels some heuristics are necessary.

## Applications

SLOTHY has been used to provide the fastest known implementations of various cryptographic and DSP primitives:
For example, the [SLOTHY paper](https://eprint.iacr.org/2022/1303) discusses the NTTs underlying ML-KEM and ML-DSA for
Cortex-{A55, A72, M55, M85}, the FFT for Cortex-{M55,M85}, and the X25519 scalar multiplication for Cortex-A55. You find
the clean and optimized source code for those examples in [`paper/`](https://github.com/slothy-optimizer/slothy/tree/main/paper).

# Getting started

Have a look at the [SLOTHY tutorial](tutorial/README.md) for a hands-on and example-based introduction to SLOTHY.

# Real world uses

* [AWS libcrypto (AWS-LC)](https://github.com/aws/aws-lc): SLOTHY-optimized X25519 code based on our un-interleaved form of the [original code by Emil
Lenngren](https://github.com/Emill/X25519-AArch64) has been [formally verified and
included](https://github.com/awslabs/s2n-bignum/pull/108) in
[s2n-bignum](https://github.com/awslabs/s2n-bignum/) (the bignum component of AWS-LC) and [merged](https://github.com/aws/aws-lc/pull/1469) into
AWS-LC. This was the topic of a [Real World Crypto 2024
talk](https://iacr.org/submit/files/slides/2024/rwc/rwc2024/38/slides.pdf).

* [s2n-bignum](https://github.com/awslabs/s2n-bignum/) routinely employs SLOTHY for finding
further highly optimized ECC implementations (e.g., [P256](https://github.com/awslabs/s2n-bignum/pull/118),
[P384](https://github.com/awslabs/s2n-bignum/pull/122), [P521](https://github.com/awslabs/s2n-bignum/pull/130) and
verifies them through automated equivalence-checking in [HOL-Light](https://hol-light.github.io/).

* [Arm EndpointAI](https://github.com/ARM-software/EndpointAI): SLOTHY-optimized code has been deployed to the CMSIS DSP Library for the radix-4 CFFT routines as part
of the Arm EndpointAI project in [this
commit](https://github.com/ARM-software/EndpointAI/commit/817bb57d8a4a604538a04627851f5e9adb5f08fc).

# Installation

## Requirements

SLOTHY has been successfully used on

- Ubuntu-21.10 and up (64-bit),
- macOS Monterey 12.6 and up.

SLOTHY requires Python >= 3.10. See [requirements.txt](https://github.com/slothy-optimizer/slothy/blob/main/requirements.txt) for package requirements, and install via `pip
install -r requirements.txt`.

**Note:** `requirements.txt` pins versions for reproducibility. If you already have newer versions of some dependencies
installed and don't want them downgraded, consider using a virtual environment:

```
python3 -m venv venv
./venv/bin/python3 -m pip install -r requirements.txt
```

Then, enter the virtual environment via `source venv/bin/activate` prior to running SLOTHY.

## Docker

A dockerfile for an Ubuntu-22.04 based Docker image with all dependencies of SLOTHY and the PQMX+PQAX test
environments setup can be found in [paper/artifact/slothy.dockerfile](https://github.com/slothy-optimizer/slothy/blob/main/paper/artifact/slothy.Dockerfile). See
[paper/artifact/README.md](https://github.com/slothy-optimizer/slothy/blob/main/paper/artifact/README.md) for instructions.

## Quick check

To check that your setup is complete, try the following from the base directory:

```
% python3 example.py --examples aarch64_simple0_a55
```

You should see something like the following:

```
* Example: aarch64_simple0_a55...
INFO:aarch64_simple0_a55:Instructions in body: 20
INFO:aarch64_simple0_a55.slothy:Perform internal binary search for minimal number of stalls...
INFO:aarch64_simple0_a55.slothy:Attempt optimization with max 32 stalls...
INFO:aarch64_simple0_a55.slothy:Objective: minimize number of stalls
INFO:aarch64_simple0_a55.slothy:Invoking external constraint solver (OR-Tools CP-SAT v9.7.2996) ...
INFO:aarch64_simple0_a55.slothy:[0.0721s]: Found 1 solutions so far... objective 19.0, bound 8.0 (minimize number of stalls)
INFO:aarch64_simple0_a55.slothy:[0.0765s]: Found 2 solutions so far... objective 18.0, bound 12.0 (minimize number of stalls)
INFO:aarch64_simple0_a55.slothy:OPTIMAL, wall time: 0.155224 s
INFO:aarch64_simple0_a55.slothy:Booleans in result: 509
INFO:aarch64_simple0_a55.slothy.selfcheck:OK!
INFO:aarch64_simple0_a55.slothy:Minimum number of stalls: 18
```

## Examples

The [SLOTHY Tutorial](tutorial/README.md) and the [examples](https://github.com/slothy-optimizer/slothy/tree/main/examples/naive) directory contain numerous exemplary
assembly snippets. To try them, use `python3 example.py --examples={YOUR_EXAMPLE}`. See `python3 example.py --help` for
the list of all available examples.

The use of SLOTHY from the command line is illustrated in [scripts/](scripts/) supporting the real-world optimizations
for the NTT, FFT and X25519 discussed in [Fast and Clean: Auditable high-performance assembly via constraint
solving](https://eprint.iacr.org/2022/1303).
Loading