Skip to content

Commit

Permalink
Merge pull request #157 from bedroge/private_s1
Browse files Browse the repository at this point in the history
Update page about setting up a private Stratum 1
  • Loading branch information
casparvl authored Jun 5, 2024
2 parents 899e4e0 + 9e97d4f commit eb50d85
Show file tree
Hide file tree
Showing 2 changed files with 122 additions and 117 deletions.
181 changes: 70 additions & 111 deletions docs/filesystem_layer/stratum1.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,66 @@
# Setting up a Stratum 1

Setting up a Stratum 1 involves the following steps:

- set up the Stratum 1, preferably by running the Ansible playbook that we provide;
- request a Stratum 0 firewall exception for your Stratum 1 server;
- request a `<your site>.stratum1.cvmfs.eessi-infra.org` DNS entry;
- open a pull request to include the URL to your Stratum 1 in the EESSI configuration.

The last two steps can be skipped if you want to host a "private" Stratum 1 for your site.

The EESSI project provides a number of geographically distributed public Stratum 1 servers that you can use to make EESSI available on your machine(s).
It is always recommended to have a local caching layer consisting of a few Squid proxies.
If you want to be even better protected against network outages and increase the bandwidth between your cluster nodes and the Stratum 1 servers,
you could also consider setting up a local (private) Stratum 1 server that replicates the EESSI CVMFS repository.
This guarantees that you always have a full and up-to-date copy of the entire stack available in your local network.

## Requirements for a Stratum 1

The main requirements for a Stratum 1 server are a good network connection to the clients it is going to serve,
and sufficient disk space. For the EESSI repository, a few hundred gigabytes should suffice, but for production
environments at least 1 TB would be recommended.
and sufficient disk space. As the EESSI repository is constantly growing, make sure that the disk space can easily be extended if necessary.
Currently, we recommend to have at least 1 TB available.

In terms of cores and memory, a machine with just a few (~4) cores and 4-8 GB of memory should suffice.

Various Linux distributions are supported, but we recommend one based on RHEL 7 or 8.
Various Linux distributions are supported, but we recommend one based on RHEL 8 or 9.

Finally, make sure that ports 80 (for the Apache web server) and 8000 are open.
Finally, make sure that ports 80 and 8000 are open to clients.


## Step 1: set up the Stratum 1
## Configure the Stratum 1

The recommended way for setting up an EESSI Stratum 1 is by running the Ansible playbook `stratum1.yml`
from the [filesystem-layer repository on GitHub](https://github.com/EESSI/filesystem-layer).
Stratum 1 servers have to synchronize the contents of their CVMFS repositories regularly, and usually they replicate from a CVMFS Stratum 0 server.
In order to ensure the stability and security of the EESSI Stratum 0 server, it has a strict firewall, and only the EESSI-maintained public Stratum 1 servers are allowed to replicate from it.
However, EESSI provides a synchronisation server that can be used for setting up private Stratum 1 replica servers, and this is available at `http://aws-eu-west-s1-sync.eessi.science`.

!!! warn Potential issues with intrusion prevention systems
In the past we have seen a few occurrences of data transfer issues when files were being pulled in by or from a Stratum 1 server.
In such cases the `cvmfs_server snapshot` command, used for synchronizing the Stratum 1, may break with errors like `failed to download <URL to file>`.
Trying to manually download the mentioned file with `curl` will also not work, and result in errors like:
```
curl: (56) Recv failure: Connection reset by peer
```
In all cases this was due to an intrusion prevention system scanning the associated network, and hence scanning all files going in or out of the Stratum 1.
Though it was a false-positive in all cases, this breaks the synchronization procedure of your Stratum 1.
If this is the case, you can try switching to HTTPS by using `https://aws-eu-west-s1-sync.eessi.science` for synchronizing your Stratum 1.
Even though there is no advantage for CVMFS itself in using HTTPS (it has built-in mechanisms for ensuring the integrity of the data),
this will prevent the described issues, as the intrusion prevention system will not be able to inspect the encrypted data.
However, not only does HTTPS introduce some overhead due to the encryption/decryption, it also makes caching in forward proxies impossible.
Therefore, it is strongly discouraged to use HTTPS as default.

Installing a Stratum 1 requires a GEO API license key, which will be used to find the (geographically) closest Stratum 1 server for your client and proxies.
More information on how to (freely) obtain this key is available in the CVMFS documentation: https://cvmfs.readthedocs.io/en/stable/cpt-replica.html#geo-api-setup.
### Manual configuration

You can put your license key in the local configuration file `inventory/local_site_specific_vars.yml`.
In order to set up a Stratum 1 manually, you can make use of the instructions in the [Private Stratum 1 replica server](https://multixscale.github.io/cvmfs-tutorial-hpc-best-practices/access/stratum1/)
section of the MultiXscale tutorial ["Best Practices for CernVM-FS in HPC"](https://multixscale.github.io/cvmfs-tutorial-hpc-best-practices/).

Furthermore, the Stratum 1 runs a Squid server. The template configuration file can be found at `templates/eessi_stratum1_squid.conf.j2`.
If you want to customize it, for instance for limiting the access to the Stratum 1, you can make your own version of this template file
and point to it by setting `local_stratum1_cvmfs_squid_conf_src` in `inventory/local_site_specific_vars.yml`.
See the comments in the example file for more details.
### Configuration using Ansible

Start by installing Ansible:
The recommended way for setting up an EESSI Stratum 1 is by running the Ansible playbook `stratum1.yml`
from the [filesystem-layer repository on GitHub](https://github.com/EESSI/filesystem-layer).
For the commands in this section, we are assuming that you cloned this repository, and your working directory is `filesystem-layer`.

!!! note GEO API
Installing a Stratum 1 usually requires a GEO API license key, which will be used to find the (geographically) closest Stratum 1 server for your client and proxies.
However, for a private Stratum 1 this can be skipped, and you can disable the use of the GEO API in the configuration of your clients by setting `CVMFS_USE_GEOAPI=no`.
In this case, they will just connect to your local Stratum 1 by default.

If you do want to set up the GEO API, you can find more information on how to (freely) obtain this key in the CVMFS documentation: https://cvmfs.readthedocs.io/en/stable/cpt-replica.html#geo-api-setup.

You can put your license key in the local configuration file `inventory/local_site_specific_vars.yml`.

Start by installing Ansible, e.g.:

```bash
sudo yum install -y ansible
Expand All @@ -47,128 +69,65 @@ sudo yum install -y ansible
Then install Ansible roles for EESSI:

```bash
ansible-galaxy role install -r requirements.yml -p ./roles --force
ansible-galaxy role install -r ./requirements.yml --force
```

Make sure you have enough space in `/srv` (on the Stratum 1) since the snapshot of the Stratum 0
will end up there by default. To alter the directory where the snapshot gets copied to you can add
this variable in `inventory/host_vars/<url-or-ip-to-your-stratum1>`:

Make sure you have enough space in `/srv` on the Stratum 1, since the snapshots of the repositories
will end up there by default. To alter the directory where the snapshots get stored you can manually
create a symlink before running the playbook:
```bash
cvmfs_srv_mount: /srv
sudo ln -s /lots/of/space/cvmfs /srv/cvmfs
```

Make sure that you have added the hostname or IP address of your server to the
`inventory/hosts` file. Finally, install the Stratum 1 using one of the two following options.
Also make sure that you have added the hostname or IP address of your server to the
`inventory/hosts` file, that you are able to log in to the server from the machine that is going to run the playbook
(preferably using an SSH key), and that you can use `sudo`.

Option 1:
Finally, install the Stratum 1 using:

``` bash
# -b to run as root, optionally use -K if a sudo password is required
ansible-playbook -b [-K] -e @inventory/local_site_specific_vars.yml stratum1.yml
# -b to run as root, optionally use -K if a sudo password is required, and optionally include your site-specific variables
ansible-playbook -b [-K] [-e @inventory/local_site_specific_vars.yml] stratum1.yml
```

Option2:

Create a ssh key pair and make sure the `ansible-host-keys.pub` is in the
`$HOME/.ssh/authorized_keys` file on your Stratum 1 server.

```bash
ssh-keygen -b 2048 -t rsa -f ~/.ssh/ansible-host-keys -q -N ""
```

Then run the playbook:

```bash
ansible-playbook -b --private-key ~/.ssh/ansible-host-keys -e @inventory/local_site_specific_vars.yml stratum1.yml
```

Running the playbook will automatically make replicas of all the repositories defined in `group_vars/all.yml`.


## Step 2: request a firewall exception

(This step is not implemented yet and can be skipped)

You can request a firewall exception rule to be added for your Stratum 1 server by
[opening an issue on the GitHub page of the filesystem layer repository](https://github.com/EESSI/filesystem-layer/issues/new).
### Verification of the Stratum 1 using `curl`

Make sure to include the IP address of your server.

## Step 3: Verification of the Stratum 1

When the playbook has finished your Stratum 1 should be ready. In order to test your Stratum 1, even
without a client installed, you can use `curl`.
When the playbook has finished, your Stratum 1 should be ready. In order to test your Stratum 1,
even without a client installed, you can use `curl`:

```bash
curl --head http://<url-or-ip-to-your-stratum1>/cvmfs/software.eessi.io/.cvmfspublished
```
This should return:
This should return something like:

```bash
HTTP/1.1 200 OK
...
X-Cache: MISS from <url-or-ip-to-your-stratum1>
```

The second time you run it, you should get a cache hit:

```bash
X-Cache: HIT from <url-or-ip-to-your-stratum1>

Content-Type: application/x-cvmfs
```

Example with the Norwegian Stratum 1:
Example with the EESSI Stratum 1 running in AWS:

```bash
curl --head http://bgo-no.stratum1.cvmfs.eessi-infra.org/cvmfs/software.eessi.io/.cvmfspublished
curl --head http://aws-eu-central-s1.eessi.science/cvmfs/software.eessi.io/.cvmfspublished
```

You can also test access to your Stratum 1 from a client, for which you will have to install the CVMFS
[client](https://github.com/EESSI/filesystem-layer#clients).

Then run the following command to add your newly created Stratum 1 to the existing list of EESSI Stratum 1 servers by creating a local CVMFS configuration file:
### Verification of the Stratum 1 using a CVMFS client

```bash
echo 'CVMFS_SERVER_URL="http://<url-or-ip-to-your-stratum1>/cvmfs/@fqrn@;$CVMFS_SERVER_URL"' | sudo tee -a /etc/cvmfs/domain.d/eessi-hpc.org.local
```
You can, of course, also test access to your Stratum 1 from a client.
This requires you to install a CernVM-FS client and add the Stratum 1 to the client configuration;
this is explained in more detail on the [native installation page](../getting_access/native_installation.md).

If this is the first time you set up the client you now run:

```bash
sudo cvmfs_config setup
```

If you already had configured the client before, you can simply reload the config:

```bash
sudo cvmfs_config reload -c software.eessi.io
```

Finally, verify that the client connects to your new Stratum 1 by running:
Then verify that the client connects to your new Stratum 1 by running:

```bash
cvmfs_config stat -v software.eessi.io
```

Assuming that your new Stratum 1 is the geographically closest one to your client, this should return:
Assuming that your new Stratum 1 is working properly, this should return something like:

```bash
Connection: http://<url-or-ip-to-your-stratum1>/cvmfs/software.eessi.io through proxy DIRECT (online)
```


## Step 4: request an EESSI DNS name

In order to keep the configuration clean and easy, all the EESSI Stratum 1 servers have a DNS name
`<your site>.stratum1.cvmfs.eessi-infra.org`, where `<your site>` is often a short name or
abbreviation followed by the country code (e.g. `rug-nl` or `bgo-no`). You can request this for
your Stratum 1 by mentioning this in the issue that you created in Step 2, or by opening another
issue.

## Step 5: include your Stratum 1 in the EESSI configuration

If you want to include your Stratum 1 in the EESSI configuration, i.e. allow any (nearby) client to be able to use it,
you can open a pull request with updated configuration files. You will only have to add the URL to your Stratum 1 to the
`urls` list of the `eessi_cvmfs_server_urls` variable in the
[`all.yml` file](https://github.com/EESSI/filesystem-layer/blob/main/inventory/group_vars/all.yml).
58 changes: 52 additions & 6 deletions docs/getting_access/native_installation.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Native installation

## Installation for single clients

Setting up native access to EESSI, that is a system-wide deployment that does not require workarounds like
[using a container](eessi_container.md), requires the installation and configuration of [CernVM-FS](https://cernvm.cern.ch/fs).

Expand Down Expand Up @@ -62,14 +64,58 @@ The good news is that all of this only requires a handful commands :astonished:
sudo cvmfs_config setup
```

## Installation for larger systems (e.g. clusters)

When using CernVM-FS on a larger number of local clients, e.g. on a HPC cluster or set of workstations,
it is very strongly recommended to at least set up some Squid proxies close to your clients.
These Squid proxies will be used to cache content that was recently accessed by your clients,
which reduces the load on the Stratum 1 servers and reduces the latency for your clients.
As a rule of thumb, you should use about one proxy per 500 clients, and have a minimum of two.
Instructions for setting up a Squid proxy can be found in the [CernVM-FS documentation](https://cvmfs.readthedocs.io/en/stable/cpt-squid.html) and
in the [CernVM-FS tutorial](https://cvmfs-contrib.github.io/cvmfs-tutorial-2021/03_stratum1_proxies/#32-setting-up-a-proxy).

Additionally, setting up a private Stratum 1, which will make a full copy of the repository,
can be beneficial to improve the latency and bandwidth even further, and to be better protected against network outages.
Instructions for setting up your own EESSI Stratum 1 can be found in [setting up your own CernVM-FS Stratum 1 mirror server](../filesystem_layer/stratum1.md).

### Configuring your client to use a Squid proxy

If you have set up one or more Squid proxies, you will have to add them to your CernVM-FS client configuration.
This can be done by removing `CVMFS_CLIENT_PROFILE="single"` from `/etc/cvmfs/default.local`, and add the following line:

```
CVMFS_HTTP_PROXY="http://ip-of-your-1st-proxy:port|http://ip-of-your-2nd-proxy:port"
```

In this case, both proxies are equally preferable.
More advanced use cases can be found in [the CernVM-FS documentation](https://cvmfs.readthedocs.io/en/stable/cpt-configure.html#proxy-list-examples).

### Configuring your client to use a private Stratum 1 mirror server

If you have set up your own Stratum 1 mirror server that replicates the EESSI CernVM-FS repositories,
you can instruct your CernVM-FS client(s) to use it by prepending your newly created Stratum 1 to the existing list of EESSI Stratum 1 servers by creating a local CVMFS configuration file for the EESSI domain:

```bash
echo 'CVMFS_SERVER_URL="http://<url-or-ip-to-your-stratum1>/cvmfs/@fqrn@;$CVMFS_SERVER_URL"' | sudo tee -a /etc/cvmfs/domain.d/eessi.io.local
```

!!! note
By prepending your new Stratum 1 to the list of existing Stratum 1 servers, your clients should by default use the private Stratum 1.
In case of downtime of your private Stratum 1, they will also still be able to make use of the public EESSI Stratum 1 servers.


### Applying changes in the CernVM-FS client configuration files

After you have made any changes to the CernVM-FS client configuration, you will have to apply them.
If this is the first time you set up the client, you can simply run:

:point_up: The commands above only cover the basic installation of EESSI.
```bash
sudo cvmfs_config setup
```

This is good enough for an individual client, or for testing purposes,
but for a production-quality setup you should also set up a Squid proxy cache.
If you already had configured the client before, you can reload the configuration for the EESSI repository (or, similarly, for any other repository) using:

For large-scale systems, like an HPC cluster, you should also consider setting up your own CernVM-FS Stratum-1 mirror server.
```bash
sudo cvmfs_config reload -c software.eessi.io
```

For more details on this, please refer to the
[*Stratum 1 and proxies section* of the CernVM-FS tutorial](https://cvmfs-contrib.github.io/cvmfs-tutorial-2021/03_stratum1_proxies/).

0 comments on commit eb50d85

Please sign in to comment.