Skip to content

Commit

Permalink
Updating CLI guide, Omnia pre-reqs
Browse files Browse the repository at this point in the history
Signed-off-by: cgoveas <[email protected]>
  • Loading branch information
cgoveas committed Jul 8, 2022
1 parent f9db19a commit c2d75b2
Show file tree
Hide file tree
Showing 5 changed files with 87 additions and 43 deletions.
3 changes: 2 additions & 1 deletion docs/BEST_PRACTICES.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@
* Avoid rebooting the Control Plane as much as possible to ensure that all network configuration does not get disturbed.
* Review the [PreRequisites](PreRequisites) before running Omnia Scripts.
* If telemetry is to be enabled using Omnia, use AWX to deploy Slurm/Kubernetes.
* Ensure that the firefox version being used on the control plane is the latest available. This can be achieved using `dnf update firefox -y`
* Ensure that the firefox version being used on the control plane is the latest available. This can be achieved using `dnf update firefox -y`
* It is recommended to configure devices using Omnia playbooks for better interoperability and ease of access.
52 changes: 45 additions & 7 deletions docs/CLI_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,45 @@ ansible-playbook control_plane.yml
## Updating inventory
On executing Omnia control plane, all devices that can be managed by Omnia will be assigned an IP and device inventories will be created by device type in `/opt/omnia`. The inventories available are:
1. ethernet_inventory <br> ![img.png](images/Ethernet_Inventory.png)
2. infiniband_inventory <br> ![img.png](images/Infiniband_Inventory.png)
3. idrac_inventory <br> ![img.png](images/idrac_inventory.png)
4. provisioned_idrac_inventory <br> ![img.png](images/Provisioned_idrac_inventory.png)
5. powervault_inventory <br> ![img.png](images/Powervault_Inventory.png)
6. node_inventory <br> ![img.png](images/node_inventory.png)
1. ethernet_inventory
```
cat /opt/omnia/ethernet_inventory
172.17.0.108
```
2. infiniband_inventory
```
cat /opt/omnia/infiniband_inventory
172.17.0.110
```
3. idrac_inventory
```
cat /opt/omnia/idrac_inventory
172.19.0.100 service_tag=XXXXXXX model="PowerEdge R640"
172.19.0.101 service_tag=XXXXXXX model="PowerEdge R740"
172.19.0.103 service_tag=XXXXXXX model="PowerEdge C6420"
172.19.0.104 service_tag=XXXXXXX model="PowerEdge R7525"
```
4. provisioned_idrac_inventory
```
cat /opt/omnia/provisioned_idrac_inventory
172.19.0.100 service_tag=XXXXXXX model="PowerEdge R640"
172.19.0.101 service_tag=XXXXXXX model="PowerEdge R740"
172.19.0.103 service_tag=XXXXXXX model="PowerEdge C6420"
172.19.0.104 service_tag=XXXXXXX model="PowerEdge R7525"
```
5. powervault_inventory
```
cat /opt/omnia/powervault_inventory
172.17.0.109
```
6. node_inventory
```
cat /opt/omnia/node_inventory
172.17.0.100 service_tag=XXXXXXX operating_system=RedHat
172.17.0.101 service_tag=XXXXXXX operating_system=RedHat
172.17.0.102 service_tag=XXXXXXX operating_system=openSUSE Leap
172.17.0.103 service_tag=XXXXXXX operating_system=Rocky
```

* To update all device inventories (Nodes will be excluded), run: `ansible-playbook configure_devices_cli.yml --tags=device_inventory`. Alternatively, run `ansible-playbook collect_device_info.yml`
* To update device inventories manually in the directory `/opt/omnia`, run `ansible-playbook collect_device_info.yml` from the `control_plane` folder. For updating the node inventory, run `ansible-playbook collect_node_info.yml` from the `control_plane` folder.
Expand Down Expand Up @@ -55,6 +88,7 @@ On executing Omnia control plane, all devices that can be managed by Omnia will
## Enable Red Hat subscription
Before running `omnia.yml`, it is mandatory that red hat subscription be set up on manager/ compute nodes running Red Hat.
* To set up Red hat subscription, fill in the [rhsm_vars.yml file](Input_Parameter_Guide/Control_Plane_Parameters/Device_Parameters/rhsm_vars.md). Once it's filled in, run the template using AWX or Ansible. <br>
* Ensure that `/opt/omnia/node_inventory` is populated with all required information. For information on how to re-run inventories manually, click [here](#updating-inventory).
* The flow of the playbook will be determined by the value of `redhat_subscription_method` in `rhsm_vars.yml`.
- If `redhat_subscription_method` is set to `portal`, run the command: <br> `ansible-playbook rhsm_subscription.yml -i inventory -e redhat_subscription_username="<username>" -e redhat_subscription_password="<password>"`
- If `redhat_subscription_method` is set to `satellite`, run the command: <br> `ansible-playbook rhsm_subscription.yml -i inventory -e redhat_subscription_activation_key="<activation-key>" -e redhat_subscription_org_id ="<org-id>"`
Expand All @@ -65,7 +99,9 @@ Before running `omnia.yml`, it is mandatory that red hat subscription be set up
`ansible_playbook omnia/control_plane/rhsm_unregister.yml -i inventory`

## Installing clusters
If all inventories and groups are assigned per the [Omnia Pre Requisites](PreRequisites/OMNIA_PreReqs.md):
* Ensure all inventories and groups are assigned per [Omnia Pre Requisites](PreRequisites/OMNIA_PreReqs.md).
* Fill out the required parameters in [omnia_config.yml](Input_Parameter_Guide/omnia_config.md).
* Verify that all nodes are assigned a group. Use the [linked example file](../examples/host_inventory_file.ini) as a reference.
* The `node_inventory` file in `opt/omnia` should have a list of all nodes (IPs) that are provisioned by Omnia. Assign groups to all nodes based on the below criteria:
* The __manager__ group should have exactly 1 manager node.
* The __compute__ group should have at least 1 node.
Expand All @@ -83,6 +119,8 @@ Run `omnia.yml` to create the cluster
`ansible-playbook platforms/jupyterhub.yml -i inventory`
### Installing Kubeflow <br>
`ansible-playbook platforms/kubeflow.yml -i inventory`

For more information on job management tools, [click here.](Installation_Guides/INSTALL_OMNIA_CLI.md#installing-jupyterhub-and-kubeflow-playbooks)



67 changes: 40 additions & 27 deletions docs/PreRequisites/OMNIA_PreReqs.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,25 @@
# Prerequisites before installing `omnia.yml`

* Verify that all inventory files are updated.
* Verify that all nodes are assigned a group.
* Verify that all nodes are assigned a group. Use the [linked example file](../../examples/host_inventory_file.ini) as a reference.
* The manager group should have exactly 1 manager node.
* The compute group should have at least 1 node.
* The login_node group is optional. If present, it should have exactly 1 node.
* The nfs_node group is optional. If powervault is configured by omnia control plane, then the host connected to the powervault (That is the nfs server) should be part of nfs_node group. There should be only 1 nfs server in the group.
>> **Note**: The inventory file accepts both IPs and FQDNs as long as they can be resolved by DNS.
## Features enabled by `omnia.yml`
* Slurm: Once all the required parameters in [omnia_config.yml](../Input_Parameter_Guide/omnia_config.md) are filled in, `omnia.yml` can be used to set up slurm.
* [Login Node (Additionally secure login node)](#enabling-security-login-node)
* Kubernetes: Once all the required parameters in [omnia_config.yml](../Input_Parameter_Guide/omnia_config.md) are filled in, `omnia.yml` can be used to set up kubernetes.
* [BeeGFS bolt on installation](#installing-beegfs-client)
* [NFS bolt on support](#nfs-bolt-on)
* [NFS server-client configuration (With powervault)](#nfs-server-configuration)

Below are the pre-requisites for all optional features that can be enabled using `omnia.yml`.

## Installing BeeGFS Client
## Optional features installed by `omnia.yml`

### Installing BeeGFS Client
* If the user intends to use BeeGFS, ensure that a BeeGFS cluster has been set up with beegfs-mgmtd, beegfs-meta, beegfs-storage services running.
Ensure that the following ports are open for TCP and UDP connectivity:

Expand All @@ -30,7 +39,29 @@ To open the ports required, use the following steps:

* Ensure that the nodes in the inventory have been assigned roles: manager, compute, login_node (optional), nfs_node

## Pre-requisites Before Enabling Security: Login Node
### NFS bolt-on
* Ensure that an existing NFS server is running. NFS clients are mounted using the existing NFS server's IP.
* Fill out the `nfs_client_params` variable in the `omnia_config.yml` file in JSON format using the samples provided [here](../Input_Parameter_Guide/omnia_config.md)
* This role runs on manager, compute and login nodes.
* Make sure that `/etc/exports` on the NFS server is populated with the same paths listed as `server_share_path` in the `nfs_client_params` in `omnia_config.yml`.
* Post configuration, enable the following services (using this command: `firewall-cmd --permanent --add-service=<service name>`) and then reload the firewall (using this command: `firewall-cmd --reload`).
- nfs
- rpc-bind
- mountd
* Omnia supports all NFS mount options. Without user input, the default mount options are nosuid,rw,sync,hard,intr. For a list of mount options, [click here](https://linux.die.net/man/5/nfs).
* The fields listed in `nfs_client_params` are:
- server_ip: IP of NFS server
- server_share_path: Folder on which NFS server mounted
- client_share_path: Target directory for the NFS mount on the client. If left empty, respective `server_share_path value` will be taken for `client_share_path`.
- client_mount_options: The mount options when mounting the NFS export on the client. Default value: nosuid,rw,sync,hard,intr.

* There are 3 ways to configure the feature:
1. **Single NFS node** : A single NFS filesystem is mounted from a single NFS server. The value of `nfs_client_params` would be <br> `- { server_ip: xx.xx.xx.xx, server_share_path: "/mnt/share", client_share_path: "/mnt/client", client_mount_options: "nosuid,rw,sync,hard,intr" }`
2. **Multiple Mount NFS Filesystem**: Multiple filesystems are mounted from a single NFS server. The value of `nfs_client_params` would be <br>` - { server_ip: xx.xx.xx.xx, server_share_path: "/mnt/server1", client_share_path: "/mnt/client1", client_mount_options: "nosuid,rw,sync,hard,intr" }` <br> `- { server_ip: xx.xx.xx.xx, server_share_path: "/mnt/server2", client_share_path: "/mnt/client2", client_mount_options: "nosuid,rw,sync,hard,intr" }`
3. **Multiple NFS Filesystems**: Multiple filesystems are mounted from multiple NFS servers. The value of `nfs_client_params` would be <br> ` - { server_ip: xx.xx.xx.xx, server_share_path: "/mnt/server1", client_share_path: "/mnt/client1", client_mount_options: "nosuid,rw,sync,hard,intr" }` <br> `- { server_ip: yy.yy.yy.yy, server_share_path: "/mnt/server2", client_share_path: "/mnt/client2", client_mount_options: "nosuid,rw,sync,hard,intr" }` <br> `- { server_ip: zz.zz.zz.zz, server_share_path: "/mnt/server3", client_share_path: "/mnt/client3", client_mount_options: "nosuid,rw,sync,hard,intr" } `


### Enabling Security: Login Node

* Verify that the login node host name has been set. If not, use the following steps to set it.
* Set hostname of the login node to hostname.domainname format using the below command:
Expand All @@ -50,17 +81,19 @@ To open the ports required, use the following steps:
>> * No upper case characters are allowed in the hostname.
>> * The hostname cannot start with a number.
## NFS server configuration


### NFS server configuration
* Ensure that powervault support is enabled by setting `powervault_support` to true in `base_vars.yml`. By default, a volume called 'omnia_home' will be created on the powervault to mount on the nfs_node.
>> **Caution**: Powervault will only be available over SAS if the powervault has been configured using [`powervault.yml`](../Device_Configuration/PowerVault.md).
* For multiple NFS volumes, enter the following details in JSON list format in `powervault_vars.yml` under `powervault_volumes`:
- name [Mandatory]: The name of the NFS export.
- server_share_path [Mandatory]: The path at which volume is mounted on nfs_node
- server_export_options: (Default) rw,sync,no_root_squash
- client_shared_path: The path at which volume is mounted on manager, compute, login node. Unless specified otherwise, the client path will inherit the options from the `server_export_path`.
- client_mount_options: Default value is- nosuid,rw,sync,hard,intr 0 0 (unless specified otherwise)
* Only one NFS server is configured per run of `omnia.yml`. To configure multiple NFS servers, update the following per execution:
* `powervault_ip` in `omnia_config.yml`
* `powervault_volumes` in `omnia_config.yml`
* `powervault_ip` in `omnia_config.yml`
* nfs_node group IP in the node inventory
* The default entry for `powervault_volumes` will look like this: <br> ` - { name: omnia_home, server_share_path: /home/omnia_home, server_export_options: ,client_share_path: , client_mount_options: }` <br>
* Ensure that `powervault_ip` is populated. The right powervault IP can be found in `/opt/omnia/powervault_inventory`. If it's not present, run `ansible-playbook collect_device_info.yml` (dedicated NIC) or `ansible-playbook collect_node_info.yml` (LOM NIC) from the control_plane directory.
Expand All @@ -74,24 +107,4 @@ To open the ports required, use the following steps:



## NFS bolt-on
* Ensure that an existing NFS server is running. NFS clients are mounted using the existing NFS server's IP.
* Fill out the `nfs_client_params` variable in the `omnia_config.yml` file in JSON format using the samples provided [here](../Input_Parameter_Guide/omnia_config.md)
* This role runs on manager, compute and login nodes.
* Make sure that `/etc/exports` on the NFS server is populated with the same paths listed as `server_share_path` in the `nfs_client_params` in `omnia_config.yml`.
* Post configuration, enable the following services (using this command: `firewall-cmd --permanent --add-service=<service name>`) and then reload the firewall (using this command: `firewall-cmd --reload`).
- nfs
- rpc-bind
- mountd
* Omnia supports all NFS mount options. Without user input, the default mount options are nosuid,rw,sync,hard,intr. For a list of mount options, [click here](https://linux.die.net/man/5/nfs).
* The fields listed in `nfs_client_params` are:
- server_ip: IP of NFS server
- server_share_path: Folder on which NFS server mounted
- client_share_path: Target directory for the NFS mount on the client. If left empty, respective `server_share_path value` will be taken for `client_share_path`.
- client_mount_options: The mount options when mounting the NFS export on the client. Default value: nosuid,rw,sync,hard,intr.

* There are 3 ways to configure the feature:
1. **Single NFS node** : A single NFS filesystem is mounted from a single NFS server. The value of `nfs_client_params` would be <br> `- { server_ip: xx.xx.xx.xx, server_share_path: "/mnt/share", client_share_path: "/mnt/client", client_mount_options: "nosuid,rw,sync,hard,intr" }`
2. **Multiple Mount NFS Filesystem**: Multiple filesystems are mounted from a single NFS server. The value of `nfs_client_params` would be <br>` - { server_ip: xx.xx.xx.xx, server_share_path: "/mnt/server1", client_share_path: "/mnt/client1", client_mount_options: "nosuid,rw,sync,hard,intr" }` <br> `- { server_ip: xx.xx.xx.xx, server_share_path: "/mnt/server2", client_share_path: "/mnt/client2", client_mount_options: "nosuid,rw,sync,hard,intr" }`
3. **Multiple NFS Filesystems**: Multiple filesystems are mounted from multiple NFS servers. The value of `nfs_client_params` would be <br> ` - { server_ip: xx.xx.xx.xx, server_share_path: "/mnt/server1", client_share_path: "/mnt/client1", client_mount_options: "nosuid,rw,sync,hard,intr" }` <br> `- { server_ip: yy.yy.yy.yy, server_share_path: "/mnt/server2", client_share_path: "/mnt/client2", client_mount_options: "nosuid,rw,sync,hard,intr" }` <br> `- { server_ip: zz.zz.zz.zz, server_share_path: "/mnt/server3", client_share_path: "/mnt/client3", client_mount_options: "nosuid,rw,sync,hard,intr" } `

Binary file modified docs/images/Omnia_Flow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 0 additions & 8 deletions examples/host_inventory_file.ini
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,3 @@ login1.omnia.local

[compute]
compute[000:064]

[workers:children]
compute

[cluster:children]
manager
login_node
workers

0 comments on commit c2d75b2

Please sign in to comment.