Warning disclaimer: this project is still "not announced". The features described might not be implemented yet.
- A from-scratch, compiled Deep Learning framework.
- Implements backpropagation (i.e. first-order reverse mode autodiff) and shape inference.
- Tensor axes have optional labels and are split into kinds: batch, input and output.
- Has full support for the
einsum
notation, integrated with shape inference. Dynamic indexing (using the last axis of one tensor as indices into another tensor) is also integrated with shape inference. - Optionally, can deduce output axes from input axes (and vice-versa TODO), e.g. with scaling to make expansion or bottleneck layers auto-adapting to the dimensionality of the data.
- Has a suite of tutorials doubling as tests with inline expectations.
dune runtest
, anddune promote
if diffs look OK. - Does not (need to) use any external computation libraries.
- Starts with a high-level representation, but can compile everything down to
for
loops. - Has multiple "backends": interpreted, compiled via OCaml, compiled via pure C, compiled via CUDA.
- Currently, compiles all computation of a single step into a monolithic routine. But users can compile any additional routines at any time (and run them at approximately any other time within a session).
- Starts with a high-level representation, but can compile everything down to
- Offers only two levels of abstraction.
- Differentiable computations, centered around the
%nn_op
syntax extension. - Plain computations, centered around the
%nn_cd
and%nn_dt
syntax extension. - Both abstraction levels share infrastructure.
Formula.t
represent tensors, and are usually potentially differentiable (we call them form formulas), but need not be (non-form formulas). non-form (non-differentiable) formulas cannot be subformulas of differentiable formulas. The%nn_cd
syntax can be used to build up non-form formulas, but also to express "primitive/glue" computations (Code.t
) that do not introduce new tensors.
- Differentiable computations, centered around the
- Supports mixed-precision computations, e.g. higher-precision network components, or gradients at a higher precision than values.
- Should be easily extensible.
- Model surgery should be starightforward (not sure if we are there yet).
- It's a feature, not a bug!
- To scale a tensor by a number, always use pointwise-multiplication, e.g.
2*.m
orm*.2
. - Matrix-multiplying a tensor
m
by a constant number, e.g.m*2
, broadcasts the number to the shape of the input axes of the tensor. This results in an output-axes-only tensor (multi-axis-vector) that is the scaled sum over the input axes of the tensorm
. - Matrix-multiplying a constant number by a tensor
m
, e.g.2*m
, broadcasts the number to the shape of the output axes of the tensor. This results in a tensor whose inputs are of the same shape as the inputs ofm
, and the output shape is 1D (scalar), that is the scaled sum over the output axes of the tensorm
. - The matrix-multiply operation behaves pointwise along the batch axes.
- To scale a tensor by a number, always use pointwise-multiplication, e.g.
- v0.3-GPU: a CUDA backend.
- v0.3.1-tiling: the tiling optimization.
- v0.4-usability: examples covering most of Andrej Karpathy's "Neural Networks Zero to Hero" series; data loading; checkpointing.
- v0.5-documentation:
.mli
files and maybe more documentation. - v0.6-scale: distributed computation; runtime-autotuning optimization settings.
- v1-completeness: whatever not-yet-implemented features that still seem needed and impact the framework design. (E.g. at the time of v0.1.X, convolutions, reshaping, concatenation are not easily expressible.)
For details, see CHANGES.
- v0.2: for multicore CPU, improve cache locality and reduce cache contention by treating the C function stack as the "device memory".
- v0.1.2: multicore computations using a thread-local "task id" index.
- v0.1.1: inlining scalar constants, improved inlining for virtual nodes.
- v0.1.0: a
Gccjit
backend, single and double precision floats, code compiled as a monolithic update step function.
Why not just use OWL?
OCANNL follows different design choices than OWL. For example:
- OCANNL is not functorized.
- OCANNL has fewer abstraction layers.
- OCANNL has arguably a more powerful shape inference.
- OCANNL only supports backpropagation, while OWL supports full forward and backward auto-diff.
- Some aspects are more centralized in OCANNL than in OWL and form the "infrastructure", with less of an intention to be extended or even read by end-users:
- Some aspects that are more core to OWL are "delegated to user-land" in OCANNL.
Operation
is just a bunch of functions, what users implementing new computational primitives would do.- Specific network architectures, e.g. MLP, CNN, Transformer, can hopefully be concisely formulated and belong to individual projects in OCANNL -- while it seems to me they are more part of the library in OWL. In this regard working on new architectures is not impeded by OCANNL.
- But the enabling mechanisms, such as "generalized
einsum
", belong to the OCANNL library/infrastructure. In this regard OCANNL is less extensible.
- OCANNL provides lower-level compilation backends than OWL, it is more self-contained in this sense.
Some ideas regarding installation (skip or substitute equivalent actions etc.):
gcc --version
, then installlibgccjit-
version-dev
- opam switch create 5.0-flambda ocaml-variants.5.0.0+options ocaml-option-flambda
- eval $(opam env --switch=5.0-flambda)
- opam install lsp ocaml-lsp-server ocamlformat
- cd ~; gh repo clone savonet/ocaml-mem_usage; cd ocaml-mem_usage; dune build; dune install
- cd ~; gh repo clone lukstafi/ppx_minidebug; cd ppx_minidebug; opam install .
- cd ~/ocannl
- opam install . --deps-only
- eval $(opam env)
- dune runtest