Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Kruize Project from nerc-ocp-test-2.nerc.mghpcc.org to Free Resources for Other Projects #838

Closed
schwesig opened this issue Dec 2, 2024 · 4 comments
Assignees
Labels
gpu openshift This issue pertains to NERC OpenShift

Comments

@schwesig
Copy link
Member

schwesig commented Dec 2, 2024

Title: Remove Kruize Project from nerc-ocp-test-2.nerc.mghpcc.org to Free Resources for Other Projects

Motivation

The Kruize project finished all its tasks and projects last week and can now be removed from the cluster nerc-ocp-test-2.nerc.mghpcc.org. This will make the cluster available for other projects. The GPU node (wrk-5, NVIDIA-A100-SXM4-40GB) can also be moved to another cluster or project where it is needed more.

Completion Criteria

Proposed Steps:

  1. Remove the Kruize project from the cluster (kruize finished all tasks)
  2. Free the cluster for new projects (clean up)
  3. Move the GPU node (wrk-5, 192.168.50.93) to another cluster or project if needed outside test-2

Impact:

  • The cluster can be used for other projects.
  • The GPU node (NVIDIA-A100-SXM4-40GB) can be used for other projects.

Completion dates

Desired - 2024-12-01
Required - 2024-12-06

Cluster Details:

  • Cluster Name: nerc-ocp-test-2.nerc.mghpcc.org
  • Control Plane Nodes:
    • ctl-0 (192.168.50.22)
    • ctl-1 (192.168.50.114)
    • ctl-2 (192.168.50.142)
  • Worker Nodes:
    • wrk-0 (192.168.50.192)
    • wrk-1 (192.168.50.199)
    • wrk-2 (192.168.50.82)
    • wrk-3 (192.168.50.68)
    • wrk-4 (192.168.50.149)
    • wrk-5 (192.168.50.93) - GPU Node (NVIDIA-A100-SXM4-40GB)

Related Issues:

Assumed Assignees:

@tssala23 @dystewart @jtriley

To Be Informed:

@schwesig @hpdempsey @larsks @computate


Node Details:

NAME CPU MEMORY (GB) STORAGE (GB) OS IP Address
ctl-0 40 125.79 185.75 Red Hat Enterprise Linux CoreOS 416.94.202406172220-0 192.168.50.22
ctl-1 40 125.79 185.75 Red Hat Enterprise Linux CoreOS 416.94.202406172220-0 192.168.50.114
ctl-2 32 125.79 185.75 Red Hat Enterprise Linux CoreOS 416.94.202406172220-0 192.168.50.142
wrk-0 40 125.79 185.75 Red Hat Enterprise Linux CoreOS 416.94.202406172220-0 192.168.50.192
wrk-1 40 125.79 185.75 Red Hat Enterprise Linux CoreOS 416.94.202406172220-0 192.168.50.199
wrk-2 32 110.01 185.75 Red Hat Enterprise Linux CoreOS 416.94.202406172220-0 192.168.50.82
wrk-3 40 125.79 185.75 Red Hat Enterprise Linux CoreOS 416.94.202406172220-0 192.168.50.68
wrk-4 40 110.00 185.75 Red Hat Enterprise Linux CoreOS 416.94.202406172220-0 192.168.50.149
wrk-5 GPU 128 1007.87 446.25 Red Hat Enterprise Linux CoreOS 416.94.202406172220-0 192.168.50.93
@tssala23
Copy link

tssala23 commented Dec 3, 2024

@schwesig Nodes have been wiped and return to ESI hardware pool and the dns records have been removed.
If a cluster is needed in the future we can re create it with a different name to match the new naming convention

@joachimweyl
Copy link
Contributor

@tzumainn is there anything we need to do to set these back to be available in ESI?

@tssala23
Copy link

tssala23 commented Dec 4, 2024

@joachimweyl they've all already been unassigned from the esi project, so back in the available pool

@schwesig
Copy link
Member Author

schwesig commented Dec 4, 2024

I will close it then.
Thanks @tssala23 <3

@schwesig schwesig closed this as completed Dec 4, 2024
@schwesig schwesig reopened this Dec 4, 2024
@schwesig schwesig closed this as completed Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gpu openshift This issue pertains to NERC OpenShift
Projects
None yet
Development

No branches or pull requests

5 participants