Merge pull request #103 from cedadev/remove-lsf-refs

Remove LSF references, fix typos
cedadev · Jul 30, 2024 · 65c4482 · 65c4482
2 parents c5d8a55 + a7393b3
commit 65c4482
Show file tree

Hide file tree

Showing 11 changed files with 96 additions and 98 deletions.
diff --git a/content/docs/batch-computing/example-job-2-calc-md5s.md b/content/docs/batch-computing/example-job-2-calc-md5s.md
@@ -16,7 +16,7 @@ This is a simple case because:
 1. the archive only needs to be read by the code and
 2. the code that we need to run involves only the basic linux commands so there are no issues with picking up dependencies from elsewhere.
 
-### Case Description**
+### Case Description
 
 - we want to calculate the MD5 checksums of about 220,000 files. It will take a day or two to run them all in series.
 - we have a text file that contains 220,000 lines - one file per line.
@@ -91,7 +91,7 @@ All jobs ran within about an hour.
 A variation on Case 2 has been used for checksumming datasets in the CMIP5
 archive. The Python code below will find all NetCDF files in a DRS dataset and
 generate a checksums file and error log. Each dataset is submitted as a
-separate bsub job.
+separate Slurm job.
 
 ```python
 """ 
@@ -116,7 +116,7 @@ def submit_job(dataset):
     if not op.exists(path):
         raise Exception('%s does not exist' % path)
     job_name = dataset
-    cmd = ('bsub -q lotus -J {job_name} '
+    cmd = ('sbatch -q short-serial -J {job_name} '
             '-o {job_name}.checksums -e {job_name}.err '
             "/usr/bin/md5sum '{path}/*/*.nc'").format(job_name=job_name,
                                                     path=path)
@@ -141,6 +141,6 @@ separate job by invoking the above script as follows:
 
 {{<command user="user" host="sci1">}}
 ./checksum_dataset.py $(cat datasets_to_checksum.dat)
-sbatch-q short-serial -J cmip5.output1.MOHC.HadGEM2-ES.rcp85.day.seaIce.day.r1i1p1.v20111128 -o cmip5.output1.MOHC.HadGEM2-ES.rcp85.day.seaIce.day.r1i1p1.v20111128.checksums -e cmip5.output1.MOHC.HadGEM2-ES.rcp85.day.seaIce.day.r1i1p1.v20111128.err /usr/bin/md5sum '/badc/cmip5/data/cmip5/output1/MOHC/HadGEM2-ES/rcp85/day/seaIce/day/r1i1p1/v20111128/*/*.nc'
+sbatch -q short-serial -J cmip5.output1.MOHC.HadGEM2-ES.rcp85.day.seaIce.day.r1i1p1.v20111128 -o cmip5.output1.MOHC.HadGEM2-ES.rcp85.day.seaIce.day.r1i1p1.v20111128.checksums -e cmip5.output1.MOHC.HadGEM2-ES.rcp85.day.seaIce.day.r1i1p1.v20111128.err /usr/bin/md5sum '/badc/cmip5/data/cmip5/output1/MOHC/HadGEM2-ES/rcp85/day/seaIce/day/r1i1p1/v20111128/*/*.nc'
 (out)Job <745307> is submitted to queue <lotus>.  ...
 {{</command>}}
diff --git a/content/docs/batch-computing/how-to-submit-a-job-to-slurm.md b/content/docs/batch-computing/how-to-submit-a-job-to-slurm.md
@@ -59,7 +59,7 @@ sleep 5m
 ```
 
 For job specification of resources please refer to Table 2 of the help article
-[LSF to Slurm quick reference]({{< ref "lsf-to-slurm-quick-reference" >}})
+[Slurm quick reference]({{< ref "slurm-quick-reference" >}})
 
 ## Method 2: Submit via command-line options
 

diff --git a/content/docs/batch-computing/lsf-to-slurm-quick-reference.md b/content/docs/batch-computing/lsf-to-slurm-quick-reference.md
diff --git a/content/docs/batch-computing/slurm-quick-reference.md b/content/docs/batch-computing/slurm-quick-reference.md
@@ -0,0 +1,73 @@
+---
+aliases: 
+    - /article/4891-lsf-to-slurm-quick-reference
+    - /docs/batch-computing/lsf-to-slurm-quick-reference/
+date: 2022-10-11 15:15:57
+description: An overview of Slurm commands and its environment variables
+slug: slurm-quick-reference
+tags:
+- lotus
+- orchid
+- slurm
+title: Slurm quick reference
+---
+
+## The Slurm Scheduler
+
+[Slurm](https://slurm.schedmd.com/) is the job scheduler deployed on JASMIN. It
+allows users to submit, monitor, and control jobs on the [LOTUS]({{< ref "lotus-overview" >}}) (CPU) and [ORCHID]({{< ref "orchid-gpu-cluster" >}}) (GPU) clusters.
+
+## Essential Slurm commands
+
+| **Slurm command**                  | **Description**                         |
+| ---------------------------------- | --------------------------------------- |
+| sbatch _script_file_               | Submit a job script to the scheduler    |
+| sinfo                              | Show available scheduling queues        |
+| squeue -u _\<username\>_           | List user's pending and running jobs    |
+| srun -n 1 -p test \--pty /bin/bash | Request an interactive session on LOTUS |
+{.table .table-striped}
+
+## Job specification
+
+<!-- Turn word wrap off to edit this table, or use a site such as https://tableconvert.com/markdown-to-markdown -->
+| **Slurm parameter**                                                                                                                             | **Description**                                                                                                                     |
+| ----------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- |
+| #SBATCH                                                                                                                                         | Scheduler directive                                                                                                                 |
+| \--partition=_queue_name_ <br> -p _queue_name_                                                                                                  | Specify the scheduling queue                                                                                                        |
+| \--time=_hh:mm:ss_ or -t _hh:mm:ss_                                                                                                             | Set the maximum runtime limit                                                                                                       |
+| \--time-min=_hh:mm:ss_                                                                                                                          | Set an estimated runtime                                                                                                            |
+| \--job-name=_jobname_                                                                                                                           | Specify a name for the job                                                                                                          |
+| \--output=_filename_ or  -o _filename_ <br> \--error=_filename_ or -e _filename_                                                                | Standard job output and error output. Default append. The default file name is `slurm-%j.out`, where `%j` is replaced by the job ID |
+| \--open-mode=append\|truncate                                                                                                                   | Write mode for error/output files                                                                                                   |
+| %j                                                                                                                                              | Job ID                                                                                                                              |
+| %a                                                                                                                                              | Job array index                                                                                                                     |
+| \--mem=_XXX_                                                                                                                                    | Memory XXX is required for the job. Default units are megabytes                                                                     |
+| \--array= _index_  (e.g. \--array=1-10)                                                                                                         | Specify a job array. The default file name is `slurm-%A_%a.out`, `%A` is replaced by the job ID and `%a` with the array index.      |
+| \--array=index% _ArrayTaskThrottle_  <br> (e.g. \--array=1-15%4 will limit the number of simultaneously running tasks from this job array to 4) | A maximum number of simultaneously running tasks from the job array may be specified using a `%` separator.                         |
+| -D <br> \--chdir=_\<directory\>_                                                                                                                | Set the working directory of the batch script to < _directory >_ before it is executed.                                             |
+| \--exclusive                                                                                                                                    | Exclusive execution mode                                                                                                            |
+| \--dependency= _\<dependency_list\>_                                                                                                            | Defer the start of this job until the specified dependencies have been satisfied completed                                          |
+| \--ntasks=_number-of-cores_ <br> -n _number-of-cores_                                                                                           | Number of CPU cores                                                                                                                 |
+| \--constraint="_\< host-group-name\>_"                                                                                                          | To select a node with a specific processor model                                                                                    |
+{.table .table-striped}
+
+## Job control commands
+
+| **Slurm command**               | **Description**               |
+| ------------------------------- | ----------------------------- |
+| scancel _\<jobid\>_             | Kill a job                    |
+| scontrol show job _\<jobid\>_   | Show details job information  |
+| scontrol update job _\<jobid\>_ | Modify a pending job          |
+| scancel \--user=_\<username\>_  | Kill all jobs owned by a user |
+{.table .table-striped}
+
+## Job environment variables
+
+| **Slurm variable**    | **Description**                      |
+| --------------------- | ------------------------------------ |
+| $SLURM_JOBID          | Job identifier number                |
+| $SLURM_ARRAY_JOB_ID   | Job array                            |
+| $SLURM_ARRAY_TASK_ID  | Job array index                      |
+| $SLURM_ARRAY_TASK_MAX | Last index number within a job array |
+| $SLURM_NTASKS         | Number of processors allocated       |
+{.table .table-striped}
diff --git a/content/docs/for-cloud-tenants/cluster-as-a-service-identity-manager.md b/content/docs/for-cloud-tenants/cluster-as-a-service-identity-manager.md
@@ -203,7 +203,7 @@ button.
 ## Managing groups
 
 When you deploy a cluster through CaaS, it may create one or more access
-control groups in FreeIPA as part of it's configuration. Some clusters can
+control groups in FreeIPA as part of its configuration. Some clusters can
 also consume additional groups created in FreeIPA. This is discussed in more
 detail in the documentation for each cluster type, but the way you manage
 group membership is the same in all cases.

diff --git a/content/docs/for-cloud-tenants/cluster-as-a-service-kubernetes.md b/content/docs/for-cloud-tenants/cluster-as-a-service-kubernetes.md
@@ -14,7 +14,7 @@ Cluster-as-a-Service (CaaS).
 [Kubernetes](https://kubernetes.io/) is an open-source system for automating
 the deployment, scaling and management of containerised applications.
 
-Kubernetes is an extremely powerful system, and a full discussion of it's
+Kubernetes is an extremely powerful system, and a full discussion of its
 capabilities is beyond the scope of this article - please refer to the
 Kubernetes documentation. This article assumes some knowledge of Kubernetes
 terminology and focuses on things that are specific to the way Kubernetes is

diff --git a/content/docs/getting-started/understanding-new-jasmin-storage.md b/content/docs/getting-started/understanding-new-jasmin-storage.md
@@ -8,7 +8,7 @@ title: Understanding new JASMIN storage
 weight: 160
 ---
 
-{{<alert type="info">}}This article was originally written in 2018/19 to introdice new forms of storage which were brought into produciton at that stage. Some of the information and terminology is now out of date, pending further review of JASMIN documentation.{{</alert>}}
+{{<alert type="info">}}This article was originally written in 2018/19 to introduce new forms of storage which were brought into production at that stage. Some of the information and terminology is now out of date, pending further review of JASMIN documentation.{{</alert>}}
 
 ## Introduction
 

diff --git a/content/docs/short-term-project-storage/faqs-storage.md b/content/docs/short-term-project-storage/faqs-storage.md
@@ -8,7 +8,7 @@ tags:
 title: New storage FAQs and issues
 ---
 
-{{<alert type="info">}}This article was originally written in 2018/19 to introdice new forms of storage which were brought into produciton at that stage. Some of the information and terminology is now out of date, pending further review of JASMIN documentation.{{</alert>}}
+{{<alert type="info">}}This article was originally written in 2018/19 to introduce new forms of storage which were brought into production at that stage. Some of the information and terminology is now out of date, pending further review of JASMIN documentation.{{</alert>}}
 
 Workflows with some of the issues highlighted below will have a knock on
 effect for other users, so please take the time to check and change your code
@@ -64,10 +64,10 @@ starting another.
 
 #### Opening the same file for editing in more than one editor on the same or different servers
 
-_Here’s an example of how this shows up using “lsof” and by listing user
+Here’s an example of how this shows up using “lsof” and by listing user
 processes with “ps”. The same file “ISIMIPnc_to_SDGVMtxt.py” is being edited
 in 2 separate “vim” editors. In this case, the system team was unable to kill
-the processes on behalf of the user, so the only solution was to reboot sci1._
+the processes on behalf of the user, so the only solution was to reboot sci1.
 
 {{<command user="user" host="sci1">}}
 lsof /gws/nopw/j04/gwsnnn/
@@ -91,20 +91,20 @@ be rebooted.
 
 ## 2\. Issues with small files
 
-_The larger file systems in operation within JASMIN are suitable for storing
+The larger file systems in operation within JASMIN are suitable for storing
 and manipulating large datasets and not currently optimised for handling small
 ( <64kBytes) files. These systems are not the same as those you would find on
 a desktop computer or even large server, and often involve many disks to store
 the data itself and metadata servers to store the file system metadata (such
 as file size, modification dates, ownership etc). If you are compiling code
 from source files, or running code from python virtual environments, these are
 examples of activities which can involve accessing large numbers of small
-files._
+files.
 
-_Later versions of our PFS systems handled this by using SSD storage for small
+Later versions of our PFS systems handled this by using SSD storage for small
 files, transparent to the user. SOF however, can’t do this (until later in
 2019), so in Phase 4, we introduced larger home directories based on SSD, as
-well as an additional and larger scratch area._
+well as an additional and larger scratch area.
 
 **Suggested solution:** Please consider using your home directory for small-
 file storage, or `/work/scratch-nopw2` for situations involving LOTUS
@@ -125,22 +125,21 @@ similar issues from writing large numbers of small files to SOF storage (known
 as QB ).
 
 **Suggested solution:** It is more efficient to write netCDF3 classic files to
-another filesystem type (e.g. /work/scratch/pw* or /work/scratch-nopw2) and then move them to a SOF
-GWS, rather than writing directly to SOF.
+another filesystem type (e.g. `/work/scratch/pw*` or `/work/scratch-nopw2`) and then move them to a SOF GWS, rather than writing directly to SOF.
 
 ---
 
 ## 3\. "Everything's running slowly today"
 
-_This can be due to overloading of the scientific analysis servers
+This can be due to overloading of the scientific analysis servers
 (`sci*.jasmin.ac.uk`) which we provide for interactive use. They’re great
 for testing a code and developing a workflow, but are not designed for
 actually doing the big processing. Please take this heavy-lifting or
 long-running work to the LOTUS batch processing cluster, leaving the
 interactive compute nodes responsive enough for everyone to use.
 
 **Suggested solution:** When you log in via one of the `login*.jasmin.ac.uk`
-nodes, you are shown a 'message of the day" a list of all the `sci*` machines,
+nodes, you are shown a 'message of the day': a list of all the `sci*` machines,
 along with memory usage and the number of users on each node at that time.
 This can help you select a less-used machine (but don’t necessarily expect the
 same machine to be the right choice next time!).