Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployer fails to deploy during ROKS on IBM Cloud with NFS Storage #546

Open
patcurtin opened this issue Oct 3, 2023 · 1 comment
Open

Comments

@patcurtin
Copy link

Describe the bug
A clear and concise description of what the bug is.
When trying to use Deployer to create a ROKS OpenShift Cluster on IBM Cloud to use NFS, 2 servers are created, 1 Bastion Server and 1 NFS Server. During the install these Servers need Python installed and selinux disabled or the deployer fails and exits

To Reproduce
Steps to reproduce the behavior:

  1. Follow the normal IBM Cloud instructions from here https://ibm.github.io/cloud-pak-deployer/10-use-deployer/3-run/ibm-cloud/
  2. Use this config file (which is pretty much a copy of the sample in the repo, use any CP4D file, it doesn't matter) :
---
global_config:
  environment_name: sample
  cloud_platform: ibm-cloud
  ibm_cloud_region: eu-de
  env_id: nfs-test
  confirm_destroy: False

provider:
- name: ibm
  region: "{{ ibm_cloud_region }}"

resource_group:
- name: "default" # should exist already

ssh_keys:
- name: "{{ env_id }}-provision"
  managed: True

security_rule:
- name: https
  tcp: {port_min: 443, port_max: 443}
- name: ssh
  tcp: {port_min: 22, port_max: 22}

vpc:
- name: "{{ env_id }}"
  allow_inbound: ['ssh']

address_prefix:
- name: "{{ env_id }}-zone"
  zone: "{{ ibm_cloud_region }}-1"
  cidr: 10.231.0.0/24

subnet:
- name: "{{ env_id }}-subnet"
  address_prefix: "{{ env_id }}-zone"
  ipv4_cidr_block: 10.231.0.0/24

vsi:
- name: "{{ env_id }}-bastion"
  infrastructure:
    type: vpc
    subnet: "{{ env_id }}-subnet"
    primary_ipv4_address: 10.231.0.196
    image: ibm-redhat-8-3-minimal-amd64-3
    profile: cx2-2x4
    public_ip: True
    keys:
    - "{{ env_id }}-provision"

nfs_server:
- name: "{{ env_id }}-nfs"
  infrastructure:
    type: vpc
    subnet: "{{ env_id }}-subnet"
    zone: "{{ ibm_cloud_region }}-1"
    primary_ipv4_address: 10.231.0.197
    image: ibm-redhat-8-3-minimal-amd64-3
    profile: cx2-2x4
    bastion_host: "{{ env_id }}-bastion"
    storage_profile: 10iops-tier
    volume_size_gb: 1000
    storage_folder: /data/nfs
    keys:
    - "{{ env_id }}-provision"

cos:
- name: "{{ env_id }}-cos"
  plan: standard
  location: global

openshift:
- name: "{{ env_id }}"
  ocp_version: 4.12.26
  compute_flavour: bx2.16x64
  compute_nodes: 6
  infrastructure:
    type: vpc
    vpc_name: "{{ env_id }}"
    subnets:
    - "{{ env_id }}-subnet"
    cos_name: "{{ env_id }}-cos"
  openshift_storage:
  - storage_name: nfs-storage
    storage_type: nfs
    nfs_server_name: "{{ env_id }}-nfs"
  1. Run the deployer

Expected behavior
OpenShift Cluster should be created with NFS Stroage

Screenshots

First error in the install :

TASK [Configure bastion servers] ***********************************************
Tuesday 03 October 2023  10:55:36 +0000 (0:00:00.055)       0:51:28.655 *******

TASK [nfs-server-ibmcloud-vpc-bastion : Enable TCP forwarding on bastion node] ***
Tuesday 03 October 2023  10:55:36 +0000 (0:00:00.041)       0:51:28.696 *******
fatal: [149.81.12.75]: FAILED! => {"changed": false, "module_stderr": "Shared connection to 149.81.12.75 closed.\r\n", "module_stdout": "/bin/sh: /usr/local/bin/python: No such file or directory\r\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127}

PLAY RECAP *********************************************************************
149.81.12.75               : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0
localhost                  : ok=831  changed=69   unreachable=0    failed=0    skipped=326  rescued=0    ignored=0

Tuesday 03 October 2023  10:55:38 +0000 (0:00:01.533)       0:51:30.230 *******
===============================================================================
provision-terraform : Run terraform apply in Terraform directory /home/pat/cpd-status/terraform, check /home/pat/cpd-status/terraform/apply.log  2822.09s
provision-terraform : Run terraform init in Terraform directory /home/pat/cpd-status/terraform -- 45.85s
download-ibmcloud : Run ibmcloud installer ----------------------------- 36.29s
provision-terraform : Run terraform plan in Terraform directory /home/pat/cpd-status/terraform, check /home/pat/cpd-status/terraform/plan.log -- 25.22s
cloudctl-download : Download cloudctl tool ----------------------------- 11.92s
ibm-pak-download : Download ibm-pak plugin ------------------------------ 8.89s
cpd-cli-download : Unpack cpd-cli from /home/pat/cpd-status/downloads/cpd-cli-linux-amd64.tar.gz --- 8.81s
cpd-cli-download : Download latest cpd-cli release ---------------------- 5.74s
openshift-download-client : Unpack OpenShift client from /home/pat/cpd-status/downloads/openshift-client-linux.tar.gz-4.12 --- 4.57s
terraform-download : Get Terraform version ------------------------------ 3.35s
openshift-download-client : Download OpenShift client "https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest-4.12/openshift-client-linux.tar.gz" --- 3.17s
ibm-pak-download : Extract ibm-pak from /home/pat/cpd-status/downloads/oc-ibm_pak-linux-amd64.tar.gz --- 2.30s
cpd-cli-download : Get current version number of cpd-cli ---------------- 2.02s
terraform-download : Download Terraform --------------------------------- 2.01s
cloudctl-download : Unpack cloudctl from /home/pat/cpd-status/downloads/cloudctl-linux-amd64.tar.gz --- 2.00s
nfs-server-ibmcloud-vpc-bastion : Enable TCP forwarding on bastion node --- 1.53s
terraform-download : Unpack Terraform from /home/pat/cpd-status/downloads/terraform_linux_amd64.zip --- 1.44s
generators : Create SSH key if not already in the vault and managed ----- 1.23s
cloudctl-download : Get current version number of clouctl --------------- 1.12s
vault-set-secret : Create directory /home/pat/cpd-status/vault if not existent --- 1.09s

====================================================================================
Deployer FAILED. Check previous messages. If command line is not returned, press ^C.

SSH into bastion
[root@nfs-test-bastion ~]# yum install python36
[root@nfs-test-bastion ~]# ln -s /usr/bin/python3.6 /usr/local/bin/python

Restart the deployer

Second ERROR

TASK [Configure bastion servers] ***********************************************
Tuesday 03 October 2023  14:34:03 +0000 (0:00:00.071)       0:04:05.803 *******

TASK [nfs-server-ibmcloud-vpc-bastion : Enable TCP forwarding on bastion node] ***
Tuesday 03 October 2023  14:34:03 +0000 (0:00:00.039)       0:04:05.842 *******
fatal: [149.81.12.75]: FAILED! => {"changed": false, "msg": "Aborting, target uses selinux but python bindings (libselinux-python) aren't installed!"}

PLAY RECAP *********************************************************************
149.81.12.75               : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0
localhost                  : ok=774  changed=51   unreachable=0    failed=0    skipped=323  rescued=0    ignored=0

Tuesday 03 October 2023  14:34:04 +0000 (0:00:01.597)       0:04:07.440 *******
===============================================================================
provision-terraform : Run terraform init in Terraform directory /home/pat/cpd-status/terraform -- 45.53s
provision-terraform : Run terraform plan in Terraform directory /home/pat/cpd-status/terraform, check /home/pat/cpd-status/terraform/plan.log -- 43.54s
download-ibmcloud : Run ibmcloud installer ----------------------------- 36.92s
cpd-cli-download : Unpack cpd-cli from /home/pat/cpd-status/downloads/cpd-cli-linux-amd64.tar.gz --- 9.05s
openshift-download-client : Unpack OpenShift client from /home/pat/cpd-status/downloads/openshift-client-linux.tar.gz-4.12 --- 4.21s
terraform-download : Get Terraform version ------------------------------ 3.35s
ibm-pak-download : Extract ibm-pak from /home/pat/cpd-status/downloads/oc-ibm_pak-linux-amd64.tar.gz --- 2.30s
cloudctl-download : Unpack cloudctl from /home/pat/cpd-status/downloads/cloudctl-linux-amd64.tar.gz --- 2.00s
nfs-server-ibmcloud-vpc-bastion : Enable TCP forwarding on bastion node --- 1.60s
ibm-pak-download : Make sure ibm-pak can be run within path ------------- 1.50s
terraform-download : Unpack Terraform from /home/pat/cpd-status/downloads/terraform_linux_amd64.zip --- 1.48s
record-deployer-state : Make sure old deployer-state.out does not exist --- 1.34s
cpd-cli-download : Check if cpdcli was already downloaded --------------- 1.03s
vault-get-secret : Check that vault file sample exists ------------------ 1.02s
record-deployer-state : Starting background task to record deployer state in /home/pat/cpd-status/log --- 0.89s
lint-config : filter the vault variables from ansible variables --------- 0.82s
merge-config : Generate config through template ------------------------- 0.82s
lint-config : Run the linter and pre-processor script for object provider --- 0.73s
generators : Generate instance of "vpc" in /home/pat/cpd-status/terraform --- 0.72s
merge-config : Get stats of /home/pat/cpd-config/config ----------------- 0.71s

====================================================================================
Deployer FAILED. Check previous messages. If command line is not returned, press ^C.

SSH into bastion
[root@nfs-test-bastion ~]# vi /etc/selinux/config
Set SELINUX=disabled
[root@nfs-test-bastion ~]# reboot

Restart the deployer

Third ERROR

TASK [Configure NFS servers] ***************************************************
Tuesday 03 October 2023  14:58:17 +0000 (0:00:02.801)       0:04:16.094 *******

TASK [nfs-server-ibmcloud-vpc-install : Format the NFS volume] *****************
Tuesday 03 October 2023  14:58:17 +0000 (0:00:00.061)       0:04:16.156 *******
included: /cloud-pak-deployer/automation-roles/40-configure-infra/nfs-server-ibmcloud-vpc-install/tasks/prepare_xfs_volume.yaml for 10.231.0.197

TASK [nfs-server-ibmcloud-vpc-install : Get volume for specified selector of 1000G] ***
Tuesday 03 October 2023  14:58:18 +0000 (0:00:00.071)       0:04:16.227 *******
fatal: [10.231.0.197]: FAILED! => {"changed": false, "module_stderr": "Shared connection to 10.231.0.197 closed.\r\n", "module_stdout": "/bin/sh: /usr/local/bin/python: No such file or directory\r\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127}

PLAY RECAP *********************************************************************
10.231.0.197               : ok=1    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0
149.81.12.75               : ok=2    changed=2    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
localhost                  : ok=774  changed=51   unreachable=0    failed=0    skipped=323  rescued=0    ignored=0

Tuesday 03 October 2023  14:58:19 +0000 (0:00:01.900)       0:04:18.128 *******
===============================================================================
provision-terraform : Run terraform init in Terraform directory /home/pat/cpd-status/terraform -- 46.35s
provision-terraform : Run terraform plan in Terraform directory /home/pat/cpd-status/terraform, check /home/pat/cpd-status/terraform/plan.log -- 42.98s
download-ibmcloud : Run ibmcloud installer ----------------------------- 36.56s
cpd-cli-download : Unpack cpd-cli from /home/pat/cpd-status/downloads/cpd-cli-linux-amd64.tar.gz --- 9.65s
openshift-download-client : Unpack OpenShift client from /home/pat/cpd-status/downloads/openshift-client-linux.tar.gz-4.12 --- 4.34s
terraform-download : Get Terraform version ------------------------------ 3.39s
nfs-server-ibmcloud-vpc-bastion : Restart sshd service ------------------ 2.80s
ibm-pak-download : Extract ibm-pak from /home/pat/cpd-status/downloads/oc-ibm_pak-linux-amd64.tar.gz --- 2.46s
cloudctl-download : Unpack cloudctl from /home/pat/cpd-status/downloads/cloudctl-linux-amd64.tar.gz --- 2.08s
nfs-server-ibmcloud-vpc-install : Get volume for specified selector of 1000G --- 1.90s
terraform-download : Unpack Terraform from /home/pat/cpd-status/downloads/terraform_linux_amd64.zip --- 1.50s
nfs-server-ibmcloud-vpc-bastion : Enable TCP forwarding on bastion node --- 1.44s
generators : Generate instance of "provider" in /home/pat/cpd-status/terraform/provider_ibm.tf --- 1.31s
record-deployer-state : Make sure old deployer-state.out does not exist --- 1.11s
ibm-pak-download : Make sure ibm-pak can be run within path ------------- 1.11s
generators : Generate instance of "resource_group" in /home/pat/cpd-status/terraform/resource_group_default.tf --- 1.03s
generators : Generate instance of "vpc" in /home/pat/cpd-status/terraform --- 0.92s
generators : Generate instance of "vsi" in /home/pat/cpd-status/terraform/vsi_nfs-test-bastion.tf --- 0.87s
generators : Generate instance of "address_prefix" in /home/pat/cpd-status/terraform/address_prefix_nfs-test-zone.tf --- 0.81s
merge-config : Generate config through template ------------------------- 0.79s

====================================================================================
Deployer FAILED. Check previous messages. If command line is not returned, press ^C.

SSH into nfs node
[root@nfs-test-nfs ~]# yum install python36
[root@nfs-test-nfs ~]# ln -s /usr/bin/python3.6 /usr/local/bin/python
[root@nfs-test-nfs ~]# vi /etc/selinux/config
Set SELINUX=disabled
[root@nfs-test-nfs ~]# reboot

Install Completes Successfully this time

Note that Python 3.8 fails due to a python-dnf issue, so python 3.6 was used.

@fketelaars
Copy link
Collaborator

@patcurtin This is more or less a catch-22. When we initially designed the deployer framework, the virtual server images on IBM Cloud were pre-installed with Python, which is a requirement for Ansible. We can try installing Python on the bastion and NFS server and then install selinux using ssh and then continue, but I would rather spend the effort on using the VPC file server capability that is now available on IBM Cloud.

Effectively:

  • Translate the nfs_server resource into a ibm_is_share Terraform resource
  • Allow mounting of the ibm_is_share resource from all virtual servers in the VPC, including OpenShift

https://registry.terraform.io/providers/IBM-Cloud/ibm/latest/docs/resources/is_share

Consequently, deployer would no longer need a bastion server, except when a private cluster is deployed. Also, the NFS becomes an extendable file server with fewer restrictions.

fketelaars added a commit that referenced this issue Oct 7, 2023
fketelaars added a commit that referenced this issue Nov 5, 2023
fketelaars added a commit that referenced this issue Nov 8, 2023
fketelaars added a commit that referenced this issue Nov 9, 2023
fketelaars added a commit that referenced this issue Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants