Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change definition of "stuck VM" in manage_vms script #996

Merged
merged 2 commits into from
Nov 16, 2023

Conversation

kysrpex
Copy link
Contributor

@kysrpex kysrpex commented Nov 16, 2023

In the current version of the script, stuck VMs are those whose name starts with vgcnbwc-worker- and do not show up in condor_status.

Change the definition of stuck so that stuck VMs are those that do not reply to ICMP echo requests.

This prevents the script from removing machines belonging to the secondary HTCondor cluster.

In the current version of the script, stuck VMs are those whose name starts with `vgcnbwc-worker-` and do not show up in `condor_status`.

Change the definition of stuck so that stuck VMs are those that do not reply to ICMP echo requests.

This prevents the script from removing machines belonging to the secondary HTCondor cluster.
@kysrpex kysrpex self-assigned this Nov 16, 2023
Copy link
Member

@sanjaysrikakulam sanjaysrikakulam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks fine. Was this function tested?

@sanjaysrikakulam sanjaysrikakulam dismissed their stale review November 16, 2023 13:43

Need to take another look as ping on the names of the VMs might not work

@kysrpex
Copy link
Contributor Author

kysrpex commented Nov 16, 2023

It looks fine. Was this function tested?

root@maintenance:/home/centos$ date
Thu Nov 16 14:47:01 CET 2023
root@maintenance:/home/centos$ /usr/local/bin/manage_vms --get-stuck-vms
10.5.68.104 vgcnbwc-worker-c120m425-0000
root@maintenance:/home/centos$ /tmp/manage_vms --get-stuck-vms
10.5.68.104 vgcnbwc-worker-c120m425-0000
root@maintenance:/home/centos$ curl https://raw.githubusercontent.com/usegalaxy-eu/infrastructure-playbook/b44e2fa62053e0182321d7bf6f0e366068c363d6/files/manage_vms | sha256sum
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4422  100  4422    0     0   110k      0 --:--:-- --:--:-- --:--:--  110k
0de583ce4e4e537e55c341af35b5ea1fa9bf7b9a8d900f6d055600ce71609b7e  -
root@maintenance:/home/centos$ sha256sum /tmp/manage_vms 
0de583ce4e4e537e55c341af35b5ea1fa9bf7b9a8d900f6d055600ce71609b7e  /tmp/manage_vms
root@maintenance:/home/centos$ /tmp/manage_vms --remove-stuck-vms
===>Deleting stuck host: vgcnbwc-worker-c120m425-0000<===
root@maintenance:/home/centos$

@kysrpex
Copy link
Contributor Author

kysrpex commented Nov 16, 2023

Need to take another look as ping on the names of the VMs might not work

It pings the IPs of the VMs.

@sanjaysrikakulam
Copy link
Member

It looks fine. Was this function tested?

root@maintenance:/home/centos$ date
Thu Nov 16 14:47:01 CET 2023
root@maintenance:/home/centos$ /usr/local/bin/manage_vms --get-stuck-vms
10.5.68.104 vgcnbwc-worker-c120m425-0000
root@maintenance:/home/centos$ /tmp/manage_vms --get-stuck-vms
10.5.68.104 vgcnbwc-worker-c120m425-0000
root@maintenance:/home/centos$ curl https://raw.githubusercontent.com/usegalaxy-eu/infrastructure-playbook/b44e2fa62053e0182321d7bf6f0e366068c363d6/files/manage_vms | sha256sum
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4422  100  4422    0     0   110k      0 --:--:-- --:--:-- --:--:--  110k
0de583ce4e4e537e55c341af35b5ea1fa9bf7b9a8d900f6d055600ce71609b7e  -
root@maintenance:/home/centos$ sha256sum /tmp/manage_vms 
0de583ce4e4e537e55c341af35b5ea1fa9bf7b9a8d900f6d055600ce71609b7e  /tmp/manage_vms
root@maintenance:/home/centos$ /tmp/manage_vms --remove-stuck-vms
===>Deleting stuck host: vgcnbwc-worker-c120m425-0000<===
root@maintenance:/home/centos$

Cool, thank you for testing it!

Machines answer within a few hundred milliseconds, but better safe than sorry.
@kysrpex
Copy link
Contributor Author

kysrpex commented Nov 16, 2023

I assume you are ok with the ping timeout increase as well. I am merging, feel free to revert f199067 any time (or to choose a different value).

@kysrpex kysrpex merged commit da68bec into usegalaxy-eu:master Nov 16, 2023
2 checks passed
@kysrpex kysrpex deleted the manage_vms_ping branch November 16, 2023 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants