Skip to content

Commit

Permalink
Document data containers
Browse files Browse the repository at this point in the history
* data package
* --data
* push & pull

Fixes #240
  • Loading branch information
dtrudg committed Sep 4, 2024
1 parent 945f6d7 commit bc23b85
Show file tree
Hide file tree
Showing 4 changed files with 186 additions and 0 deletions.
165 changes: 165 additions & 0 deletions data_containers.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
.. _sec:data-containers:

###############
Data Containers
###############

*New in {Singularity} 4.2 OCI-Mode.*

********
Overview
********

Workflows in HPC often involve three distinct inputs:

- User data, which needs to be analyzed.
- A software application, which will analyze the user data.
- Reference data, which the software uses to make sense of the user data.

Packaging the software application into an OCI-SIF, with {Singularity} in
OCI-Mode, makes it easy to run and share. User data is also easy to handle with
{Singularity}; simply bind your project directories or files from the HPC system
into the container.

Reference data is a little more complicated, as it tends to be specific to the
software being used and the data being analyzed. Perhaps you are aligning
RNA-Seq data to a reference genome sequence, or passing medical images through a
neural network model. Different reference data might be needed for different
inputs (human vs mouse sequences, CT vs MRI images). Although software is
containerized and ready to go, you will probably have to download reference data
from a 3rd party, assemble, and often pre-process it before it can be used with
the specific program that you need to run.

Putting all the reference data that might ever be needed into the same container
as the software application could simplify things, but could make that container
very large. What if we could easily distribute different sets of reference data
alongside, but separately from the software application? The solution is a data
container.

*************************
Creating a Data Container
*************************

{Singularity} 4.2 introduces the ``data package`` command, to create a data
container OCI-SIF, by 'packaging' files and directories on the host:

.. code::
$ singularity data package <source file/dir> <data container>
For example, to create a data container from the content of the directory
``mydata/`` on the host:

.. code::
$ singularity data package mydata mydata.oci.sif
INFO: Converting layers to SquashFS
The resulting OCI-SIF file contains the packaged data as a SquashFS image,
stored as an OCI artifact, with associated manifest. This allows it to be pushed / pulled
to and from standard OCI registries.

**********************
Using a Data Container
**********************

.. note::

OCI-SIF data containers can only be used in OCI-Mode (when running
containers with ``--oci``).

To use a data container with an application container, the ``--data`` flag is
passed to ``run / shell / exec`` in OCI-Mode. The data flag takes one or more
comma separated ``<data container>:<dest>`` pairs, where ``<data container>`` is
the path to the data container to use, and ``<dest>`` is the path in the
application container at which its content should be made available.

For example, to make the content of the ``mydata.oci.sif`` data container
available under ``/mydata`` in an application container:

.. code::
$ singularity run --oci --data mydata.oci.sif:/mydata application.oci.sif
dtrudg-sylabs@mini:~$ ls /mydata/
bar foo
You can use more than one data container by specifying the ``--data`` flag
multiple times, or listing comma separated ``<data container>:<dest>`` pairs:

.. code::
$ singularity run --oci \
--data mydata.oci.sif:/mydata,otherdata.oci.sif:/otherdata \
application.oci.sif
Is equivalent to:

.. code::
$ singularity run --oci \
--data mydata.oci.sif:/mydata \
--data otherdata.oci.sif:/otherdata \
application.oci.sif
************************
Sharing a Data Container
************************

As mentioned above, a data container stores a SquashFS filesystem as an OCI
artifact. This means it can be pushed to, and pulled from, standard OCI
registries alongside application container images.

To push to the container library:

.. code::
$ singularity push -U mydata.oci.sif library://example/datac/mydata:latest
WARNING: Skipping container verification
INFO: Pushing an OCI-SIF to the library OCI registry. Use `--oci` to pull this image.
4.0KiB / 4.0KiB [=================================================================] 100 %0s
To pull from the container library:

.. code::
$ singularity pull --oci mydata.oci.sif library://example/datac/mydata:latest
WARNING: OCI image doesn't declare a platform. It may not be compatible with this system.
INFO: Cleaning up.
WARNING: integrity: signature not found for object group 1
WARNING: Skipping container verification
To push to Docker Hub, or a similar OCI registry, :ref:`after authenticating <registry>`:

.. code::
$ singularity push mydata.oci.sif docker://dctrud/mydata:latest
4.0KiB / 4.0KiB [=================================================================] 100 %0s
INFO: Upload complete
To pull from Docker Hub, or a similar OCI registry:

.. code::
$ singularity pull --oci docker://dctrud/mydata:latest
WARNING: OCI image doesn't declare a platform. It may not be compatible with this system.
INFO: Using cached OCI-SIF image
1 change: 1 addition & 0 deletions index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ networking and security configuration.

Bind Paths and Mounts <bind_paths_and_mounts>
Persistent Overlays <persistent_overlays>
Data Containers <data_containers>
Instances - Running Services <running_services>
Environment and Metadata <environment_and_metadata>
Plugins <plugins>
Expand Down
12 changes: 12 additions & 0 deletions new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ OCI-mode
OCI-SIF image to be pushed to ``library://`` and ``docker://`` registries with
layers in the standard OCI tar format. Images pushed with ``--layer-format``
tar can be pulled and run by other OCI runtimes. See :ref:`sec:layer-format`.

- Persistent overlays embedded in OCI-SIF files. See :ref:`overlay-oci-sif`.

- A writable overlay can be added to an OCI-SIF file with the ``singularity
Expand All @@ -34,6 +35,17 @@ OCI-mode
an OCI-SIF image into a read-only squashfs layer. This seals changes made to
the image via the overlay, so that they are permanent.

- OCI-SIF data containers provide a way to package reference data into an
OCI-SIF file that can be distributed alongside application containers. See
:ref:`sec:data-containers`.

- A new ``singularity data package`` command allows files and directories to
be packaged into an OCI-SIF data container.
- A new ``--data <data container>:<dest>`` flag for OCI-Mode allows the
contents of a data container to be made available at ``<dest>`` inside an
application container.


*******
Runtime
*******
Expand Down
8 changes: 8 additions & 0 deletions oci_runtime.rst
Original file line number Diff line number Diff line change
Expand Up @@ -470,6 +470,14 @@ addition to the image manifest and image config:
Multi-layer OCI-SIF images are supported by {Singularity} 4.1 and later. Than
cannot be executed using {Singularity} 4.0.

Data Containers
===============

The OCI-SIF format also supports, from {Singularity} 4.2, the creation of
:ref:`<sec:data-containers>`, which can be used to distribute reference data
alongside applications containers, in a convenient single file that may be
shared via standard OCI registries. See the :ref:`<sec:data-containers>` section
for more information.

.. _sec:cdi:

Expand Down

0 comments on commit bc23b85

Please sign in to comment.