Skip to content

Commit

Permalink
security_options: update capabilities for oci-mode
Browse files Browse the repository at this point in the history
Fixes #186
  • Loading branch information
dtrudg committed Aug 31, 2023
1 parent 9efe71a commit 768d088
Showing 1 changed file with 149 additions and 70 deletions.
219 changes: 149 additions & 70 deletions security_options.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,38 +6,47 @@ Security Options

.. _sec:security_options:

{Singularity} 3.0 introduces many new security related options to the
container runtime. This document will describe the new methods users
have for specifying the security scope and context when running
{Singularity} containers.
{Singularity} can make use of various Linux kernel features to modify the
security scope and context of running containers. Non-root users may be granted
additional permissions using Linux capabilities. SELinux, AppArmor, and Seccomp
can be used to restrict the operations that can be performed by a container.

******************
Linux Capabilities
******************

.. note::
Native runtime / non-OCI-Mode
=============================

In {Singularity}'s default configuration, without ``--oci``, a container started
by root receives all capabilities, while a container started by a non-root user
receives no capabilities.

It is extremely important to recognize that **granting users Linux
capabilities with the** ``capability`` **command group is usually
identical to granting those users root level access on the host
system**. Most if not all capabilities will allow users to "break
out" of the container and become root on the host. This feature is
targeted toward special use cases (like cloud-native architectures)
where an admin/developer might want to limit the attack surface
within a container that normally runs as root. This is not a good
option in multi-tenant HPC environments where an admin wants to grant
a user special privileges within a container. For that and similar
use cases, the :ref:`fakeroot feature <fakeroot>` is a better option.

{Singularity} provides full support for granting and revoking Linux
capabilities on a user or group basis. For example, let us suppose that
an admin has decided to grant a user (named ``pinger``) capabilities to
open raw sockets so that they can use ``ping`` in a container where the
binary is controlled via capabilities. For information about how to
manage capabilities as an admin please refer to the `capability admin
docs
Additionally, {Singularity} provides support for granting and revoking Linux
capabilities on a user or group basis. For example, let us suppose that an
administrator has decided to grant a user (named ``pinger``) capabilities to
open raw sockets so that they can use ``ping`` in a container where the binary
is controlled via capabilities. For information about how to manage capabilities
as an admin please refer to the `capability admin docs
<https://sylabs.io/guides/{adminversion}/admin-guide/configfiles.html#capability.json>`_.

.. note::

In {Singularity}'s default setuid and non-OCI mode, containers are only
isolated in a mount namespace. A user namespace, which limits the scope of
capabilities, is not used by default.

Therefore, it is extremely important to recognize that **granting users Linux
capabilities with the** ``capability`` **command group is usually identical
to granting those users root level access on the host system**. Most, if not
all, capabilities will allow users to "break out" of the container and become
root on the host. This feature is targeted toward special use cases (like
cloud-native architectures) where an admin/developer might want to limit the
attack surface within a container that normally runs as root. This is not a
good option in multi-tenant HPC environments where an admin wants to grant a
user special privileges within a container. For that and similar use cases,
the :ref:`fakeroot feature <fakeroot>` is a better option.

To take advantage of this granted capability as a user, ``pinger`` must
also request the capability when executing a container with the
``--add-caps`` flag like so:
Expand Down Expand Up @@ -102,32 +111,88 @@ The ``--add-caps`` and ``--drop-caps`` options will accept the ``all``
keyword. Of course appropriate caution should be exercised when using
this keyword.

*****************************
Building encrypted containers
*****************************

Beginning in {Singularity} 3.4.0 it is possible to build and run
encrypted containers. The containers are decrypted at runtime entirely
in kernel space, meaning that no intermediate decrypted data is ever
present on disk. See :ref:`encrypted containers <encryption>` for more
details.
OCI-Mode
========

When containers are run in OCI-mode, by a non-root user, initialization is
always performed inside a user namespace. The capabilities granted to a
container are specific to this user namespace. For example, ``CAP_SYS_ADMIN``
granted to an OCI-mode container does not give the user the ability to mount a
filesystem outside of the container's user namespace.

Because of this isolation of capabilities users can add and drop capabilities,
using ``--add-caps`` and ``--drop-caps``, without the need for the administrator
to have granted permission to do so with the ``singularity capabilities``
command.

OCI-mode containers do not inherit the user's own capabilities, but instead run
with a default set of capabilities that matches other OCI runtimes.

- CAP_NET_RAW
- CAP_NET_BIND_SERVICE
- CAP_AUDIT_READ
- CAP_AUDIT_WRITE
- CAP_DAC_OVERRIDE
- CAP_SETFCAP
- CAP_SETPCAP
- CAP_SETGID
- CAP_SETUID
- CAP_MKNOD
- CAP_CHOWN
- CAP_FOWNER
- CAP_FSETID
- CAP_KILL
- CAP_SYS_CHROOT

When the container is entered as the root user (e.g. with ``--fakeroot``), these
default capabilities are added to the effective, permitted, and bounding sets.

When the container is entered as a non-root user, these default capabilities are
added to the bounding set.

*******************************
Security related action options
*******************************

{Singularity} 3.0 introduces many new flags that can be passed to the
action commands; ``shell``, ``exec``, and ``run`` allowing fine grained
control of security.
When starting a container with the action commands ``shell``, ``exec``, and
``run``, various flags allow fine grained control of security.

``--add-caps``
==============

As explained above, ``--add-caps`` will "activate" Linux capabilities
when a container is initiated, providing those capabilities have been
granted to the user by an administrator using the ``capability add``
command. This option will also accept the case insensitive keyword
``all`` to add every capability granted by the administrator.
In the default non-OCI-mode, ``--add-caps`` will grant specified Linux
capabilities (e.g. ``CAP_NET_RAW``) to a container, provided that those
capabilities have been granted to the user by an administrator using the
``capability add`` command. This option will also accept the case insensitive
keyword ``all`` to add every capability granted by the administrator.

In OCI-mode, ``--add-caps`` will grant specified Linux capabilities (e.g.
``CAP_NET_RAW``) to the container. Because the container runs in a user
namespace, the capabilities are not effective on the host and do not have to be
granted by the administrator. The keyword ``all`` will grant all available
capabilities to the container.

``--drop-caps``
===============

In the default non-OCI-mode, the root user has a full set of capabilities when
they enter the container. You may choose to drop specific capabilities when you
initiate a container as root to enhance security.

For instance, to drop the ability for the root user to open a raw socket
inside the container:

.. code::
$ sudo singularity exec --drop-caps CAP_NET_RAW library://centos ping -c 1 8.8.8.8
ping: socket: Operation not permitted
In OCI-mode any user can use ``--drop-caps`` to run a container with fewer
capabilities than the default OCI capability set.

The ``--drop-caps`` option will also accept the case insensitive keyword
``all`` as an option to drop all capabilities when entering the
container.

``--allow-setuid``
==================
Expand All @@ -138,24 +203,30 @@ a user to execute a command with elevated privileges. But other SetUID
binaries may allow a user to execute a command as a service account.

By default SetUID is disallowed within {Singularity} containers as a
security precaution. But the root user can override this precaution and
allow SetUID binaries to behave as expected within a {Singularity}
container with the ``--allow-setuid`` option like so:
security precaution, by mounting container filesystems as ``nosetuid.``

In the default non-OCI-mode, the root user can override this precaution and
allow SetUID binaries to behave as expected within a {Singularity} container
with the ``--allow-setuid`` option like so:

.. code::
$ sudo singularity shell --allow-setuid some_container.sif
In OCI-mode, any user can permit SetUID binaries with the ``--allow-setuid``
option. Because an OCI-mode container is always run in a user namespace, SetUID
will change to UIDs inside a user's permitted subuid/subgid mapping. This does
not allow access to arbitrary UIDs on the host system.

``--keep-privs``
================

It is possible for an admin to set a different set of default
capabilities or to reduce the default capabilities to zero for the root
user by setting the ``root default capabilities`` parameter in the
``singularity.conf`` file to ``file`` or ``no`` respectively. If this
change is in effect, the root user can override the ``singularity.conf``
file and enter the container with full capabilities using the
``--keep-privs`` option.
In the default non-OCI-mode, it is possible for an admin to set a different set
of default capabilities or to reduce the default capabilities to zero for the
root user by setting the ``root default capabilities`` parameter in the
``singularity.conf`` file to ``file`` or ``no`` respectively. If this change is
in effect, the root user can override the ``singularity.conf`` file and enter
the container with full capabilities using the ``--keep-privs`` option.

.. code::
Expand All @@ -167,32 +238,30 @@ file and enter the container with full capabilities using the
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 18.838/18.838/18.838/0.000 ms
``--drop-caps``
===============

By default, the root user has a full set of capabilities when they enter
the container. You may choose to drop specific capabilities when you
initiate a container as root to enhance security.

For instance, to drop the ability for the root user to open a raw socket
inside the container:
In OCI-mode, the ``--keep-privs`` option can be used by any user. In this
mode, ``--keep-privs`` will cause the container to run inheriting the current
effective capabilities rather than using the OCI default capability set. When
entering the container as a non-root user, the capabilities are only inherited
to the bounding set.

.. code::
``--no-privs``
==============

$ sudo singularity exec --drop-caps CAP_NET_RAW library://centos ping -c 1 8.8.8.8
ping: socket: Operation not permitted
In the default non-OCI-mode, the ``--no-privs`` option allows the root user to run
a container with all capabilities dropped, and sets the ``no_new_privs`` bit
that will prevent the container process gaining any further privilege.

The ``drop-caps`` option will also accept the case insensitive keyword
``all`` as an option to drop all capabilities when entering the
container.
In OCI-mode, the ``--no-privs`` option can be used by any user to run a
container with all capabilities dropped, and to set the ``no_new_privs`` bit
that will prevent the container process gaining any further privilege.

``--security``
==============

The ``--security`` flag allows the root user to leverage security
modules such as SELinux, AppArmor, and seccomp within your {Singularity}
container. You can also change the UID and GID of the user within the
container at runtime.
The ``--security`` flag, currently supported in non-OCI-mode only, allows the
root user to leverage security modules such as SELinux, AppArmor, and seccomp
within your {Singularity} container. It is also possible to change the UID and
GID of the user within the container at runtime.

For instance:

Expand Down Expand Up @@ -267,3 +336,13 @@ follows:
--security="uid:1000"
--security="gid:1000"
--security="gid:1000:1:0" (multiple gids, first is always the primary group)
********************
Encrypted containers
********************

Beginning in {Singularity} 3.4.0 it is possible to build and run
encrypted containers. The containers are decrypted at runtime entirely
in kernel space, meaning that no intermediate decrypted data is ever
present on disk. See :ref:`encrypted containers <encryption>` for more
details.

0 comments on commit 768d088

Please sign in to comment.