rhel 8.10 - compute-redhat.yml breaks image build #419

xdkreij · 2024-06-28T07:17:38Z

Problem description
During iPXE boot, the following challenge pops up. Maybe someone has encountered this before in the past?

Command used
ansible-playbook compute-redhat.yml -v

Expected results
A working image that boots successfully :-)

The text was updated successfully, but these errors were encountered:

xdkreij · 2024-06-28T09:07:46Z

A dump of found 'issues'

000000 04:59:47 [root@cpu site]# luna osimage kernel compute Traceback (most recent call last): File "/bin/luna", line 7, in <module> CLI = Cli().main() File "/trinity/local/python/lib/python3.10/site-packages/luna/cli.py", line 109, in main self.call_class() File "/trinity/local/python/lib/python3.10/site-packages/luna/cli.py", line 137, in call_class call(self.args, self.parser, self.subparsers) File "/trinity/local/python/lib/python3.10/site-packages/luna/osimage.py", line 67, in __init__ call(self) File "/trinity/local/python/lib/python3.10/site-packages/luna/osimage.py", line 299, in kernel_osimage http_response = result.json() AttributeError: 'types.SimpleNamespace' object has no attribute 'json'

Adding a print statement to python like so print(result.content) results in

{'message': 'osimage pack for compute already queued', 'request_id': '1719565192.3494275247818686'}

aphmschonewille · 2024-07-01T13:32:38Z

"osimage pack for compute already queued" normally means that another packing for that image was already in progress. It prevents it from being packed twice at the same time. However if changes were made while the other packing was already in progress, things will go wrong. Was there only one packing active at that time, or were there concurrent operations going or something else?

xdkreij · 2024-07-02T08:18:06Z

"osimage pack for compute already queued" normally means that another packing for that image was already in progress. It prevents it from being packed twice at the same time. However if changes were made while the other packing was already in progress, things will go wrong. Was there only one packing active at that time, or were there concurrent operations going or something else?

Only one - via the compute-redhat.yml :-)

I wonder if this would result in the 'kernel panic' eventually. The playbook seems/completes successful but apparently something goes terribly wrong with the image (build?) itself.

(side note: I do have to fix rhsm.conf half way through the play within the image itself, otherwise the redhat.repo gets overwritten and redirects to cdn.redhat.com instead - but i doubt that it would result in image issues itself since afterwards everything kicks of fine.)

xdkreij · 2024-07-02T09:13:52Z

w00000t.... i think i may have solved it...

What i did was posted here: https://www.linuxquestions.org/questions/linux-server-73/centos-7-does-not-boot-4175619015/

Like so...

cp /sbin/init /trinity/images/compute/sbin/init
cp /lib/systemd/systemd /trinity/images/compute/lib/systemd/systemd
 
 luna osimage pack compute
 luna node change -o compute node001
 --- reboot node ---

I've got no clue whatsoever why it doesn't work without.. but I'll test the compute-redhat.yml again with a new image soon to verify if this actually solved it.

aphmschonewille · 2024-07-03T07:16:58Z

There were two problems that you hit. There was indeed a bug in the cli where a returned call caused the python trace. That has been solved and will be released soon. The other problem you see, the missing of /sbin/init is something i cannot really explain yet. May i ask when you cloned the TrinityX repo? this helps us determining if this is an ongoing problem or something that has already been solved through other fixes.

xdkreij · 2024-07-03T09:52:35Z

There were two problems that you hit. There was indeed a bug in the cli where a returned call caused the python trace. That has been solved and will be released soon. The other problem you see, the missing of /sbin/init is something i cannot really explain yet. May i ask when you cloned the TrinityX repo? this helps us determining if this is an ongoing problem or something that has already been solved through other fixes.

The repo has been cloned (lucky for me I keep track of things using ARA) on the 24th during the re-deployment of the entire controller on RHEL 8.10;

trick-1 · 2024-08-07T06:50:52Z

I too have this issue. I note whilst executing "ansible-playbook default.yml" the following

[WARNING]: Target is a chroot or systemd is offline. This can lead to false positives or prevent the init system tools from working.

Suspect it might be a systemd not behaving in chroot related issue.....

trick-1 · 2024-08-07T22:18:07Z

A bit more information. I pulled the repro yesterday and built a new controller per the instructions here (https://supercomputing.tue.nl/documentation/administration/trinityx/installation/#fix-uchiwa-logrotate-script-owner ) it completed without issue.

I then went to build the node images

#ansible-playbook compute-default.yml -v

it also completed without error except for the warning above.

I have since changed to an alternate distribution build
alternative_distribution: Rocky-9

I get the same error regardless.

I also note that during the bring up of the node it fails to copy /etc/passwd and /etc/groups

trick-1 · 2024-08-08T05:42:42Z

so I commented out the following section

#- import_playbook: imports/trinity-redhat-image-setup.yml
  #vars:
#   hostlist: "{{ hostvars['localhost']['image_name'] }}.osimages.luna"

and the image builds and deploys without error....time to step through the setup and see what breaks

aphmschonewille · 2024-10-23T07:29:20Z

hi gents,

quite a bit has changed (read: fixes, improvements and of course the addition of new bugs :)
We've tested the installation a million times, using defaults as much as possible but do not see any problems, or any of the above issues. Who has time to pull the latest release, 14.4u1 and try? or can we close the issue?

--Antoine

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rhel 8.10 - compute-redhat.yml breaks image build #419

rhel 8.10 - compute-redhat.yml breaks image build #419

xdkreij commented Jun 28, 2024

xdkreij commented Jun 28, 2024

aphmschonewille commented Jul 1, 2024

xdkreij commented Jul 2, 2024 •

edited

Loading

xdkreij commented Jul 2, 2024 •

edited

Loading

aphmschonewille commented Jul 3, 2024

xdkreij commented Jul 3, 2024 •

edited

Loading

trick-1 commented Aug 7, 2024

trick-1 commented Aug 7, 2024

trick-1 commented Aug 8, 2024 •

edited

Loading

aphmschonewille commented Oct 23, 2024

rhel 8.10 - compute-redhat.yml breaks image build #419

rhel 8.10 - compute-redhat.yml breaks image build #419

Comments

xdkreij commented Jun 28, 2024

xdkreij commented Jun 28, 2024

aphmschonewille commented Jul 1, 2024

xdkreij commented Jul 2, 2024 • edited Loading

xdkreij commented Jul 2, 2024 • edited Loading

aphmschonewille commented Jul 3, 2024

xdkreij commented Jul 3, 2024 • edited Loading

trick-1 commented Aug 7, 2024

trick-1 commented Aug 7, 2024

trick-1 commented Aug 8, 2024 • edited Loading

aphmschonewille commented Oct 23, 2024

xdkreij commented Jul 2, 2024 •

edited

Loading

xdkreij commented Jul 2, 2024 •

edited

Loading

xdkreij commented Jul 3, 2024 •

edited

Loading

trick-1 commented Aug 8, 2024 •

edited

Loading