Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slurm_resume creating files in /tmp of head_node and not cleaning up.... #6572

Open
gwolski opened this issue Nov 18, 2024 · 2 comments
Open

Comments

@gwolski
Copy link

gwolski commented Nov 18, 2024

parallelcluster 3.9.1 and 3.11.1
I see many files in /tmp since the start of the head_node of the form:

-rw-r----- 1 slurm pcluster-slurm-share 1088 Nov 18 09:14 tmp.VBWqTAz4SS
-rw-r----- 1 slurm pcluster-slurm-share 262 Nov 18 09:05 tmp.zMJAymfADb
-rw-r----- 1 slurm pcluster-slurm-share 276 Nov 18 09:01 tmp.4FNDMLk4rC
-rw-r----- 1 slurm pcluster-slurm-share 282 Nov 18 08:59 tmp.xVx3sST9n3
-rw-r----- 1 slurm pcluster-slurm-share 282 Nov 18 08:55 tmp.a23WPhksqt
-rw-r----- 1 slurm pcluster-slurm-share 473 Nov 18 08:41 tmp.D3MvXoHL1g
-rw-r----- 1 slurm pcluster-slurm-share 1691 Nov 18 08:40 tmp.zYS9m1CU3j
-rw-r----- 1 slurm pcluster-slurm-share 1488 Nov 18 08:40 tmp.mCOXXmqOHt
-rw-r----- 1 slurm pcluster-slurm-share 1488 Nov 18 08:39 tmp.di3dHuc923
-rw-r----- 1 slurm pcluster-slurm-share 265 Nov 18 08:37 tmp.AXTQoAX1hL
-rw-r----- 1 slurm pcluster-slurm-share 265 Nov 18 08:37 tmp.45sPSvBlrB

I would expect slurm (slurmctld since we're on the head node?) to clean up and not leave crumbs.
Contents of the file seem to relate starting jobs and are of the form:

{"jobs":[{"extra":null,"job_id":17836,"features":null,"nodes_alloc":"sp-r7a-m-dy-sp-8-gb-1-cores-40","nodes_resume":"sp-r7a-m-dy-sp-8-gb-1-cores-40","oversubscribe":"NO","partition":"sp-8-gb","reservation":null},{"extra":null,"job_id":17837,"features":null,"nodes_alloc":"sp-r7a-m-dy-sp-8-gb-1-cores-41","nodes_resume":"sp-r7a-m-dy-sp-8-gb-1-cores-41","oversubscribe":"NO","partition":"sp-8-gb","reservation":null}],"all_nodes_resume":"sp-r7a-m-dy-sp-8-gb-1-cores-[40-41]"}

Is this a bug, feature, or known issue? Should I be cleaning up head_node /tmp/ files older than N days?

@gwolski gwolski added the 3.x label Nov 18, 2024
@hanwen-cluster
Copy link
Contributor

Apologies for the late reply. We are able to reproduce the issue and discussing.

@hanwen-cluster
Copy link
Contributor

We've merged the PR and the change will be included in the next release.

For now, you can delete any of the files older than 1 day

@hanwen-cluster hanwen-cluster changed the title slurm (slurmctld?) creating files in /tmp of head_node and not cleaning up.... slurm_resume creating files in /tmp of head_node and not cleaning up.... Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants