The SLURM scheduler¶

The cluster uses the SLURM workload manager for resource allocation and job management.

SLURM uses partitions which are analogous to queues in other scheduling systems. There is a single partition called pearl to which all jobs are submitted. Resources are assigned to an account and your unique username is then associated with one (or more) of these accounts depending on your organisational association. It is this association that provides you with access to a set of resources on the cluster.

Getting cluster information and health¶

Two of the most useful SLURM commands that are available to you are sinfo and squeue.

The sinfo command will show you the general health of the cluster, including the current state of the nodes and general information about them such as number of CPUs and how much memory they have.

This example invokes the command using the optional –N (node-centric view) and –l (long view):

$ sinfo -N -l
Tue Nov 19 11:55:23 2019
NODELIST   NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
mn1            1    pearl*       mixed   96   2:24:2 154680        0      1   (null) none
mn2            1    pearl*        idle   96   2:24:2 154680        0      1   (null) none

The squeue command will show information on currently running jobs and jobs that are queued while they are waiting for resources to become available:

$ squeue –l
Tue Nov 19 13:20:50 2019
JOBID   PARTITION     NAME     USER    STATE         TIME      TIME_LIMI     NODES NODELIST(REASON)
1257        pearl     bash pearl014  RUNNING     13:30:37     2-00:00:00     1     mn1
1251        pearl     bash pearl014  RUNNING   1-15:18:05     2-00:00:00     1     mn1
1264        pearl     bash pearl014  RUNNING      1:00:13     2-00:00:00     1     mn1
1218        pearl run_v3.s pearl024  RUNNING   3-22:29:29     5-23:00:00     1     mn1

Requesting resources¶

SLURM has three commands for requesting resources; sbatch, salloc and srun. The first two allocate resources to you while the latter launches tasks across those resources.

The srun command¶

A very simple example of a job is to request an interactive session on a worker node. The following command will ask the scheduler to provide a command prompt with one CPU and one GPU:

$ srun -n 1 --gres=gpu:1 --pty /bin/bash

Once on the node, if you issue the command nvidia-smi you should see that you have access to a single V100 GPU.

You are able to submit more complex requests using command line parameters but a better approach is to combine them into a job script and then submit your request to the scheduler using the sbatch command.

The sbatch command¶

This command submits a batch script to the SLURM scheduler by passing the name of a file which contains all your job commands. The file should include a list of #SBATCH directives (also known as job submission flags) which tell the job scheduler what you want to do. There are many job submission flags but at a minimum you should include the following:

#SBATCH -p pearl #(the queue to run on)
#SBATCH -n 1 #(number of processors to run on)
#SBATCH --job-name="<job_name>"
#SBATCH --gres=gpu:1 #(number of GPUs to use)
#SBATCH -t 2:30 #(expected job run time)

The estimated run time of your job is an important parameter as it will allow the scheduler to make a better decision as to when your job. You don’t have to be exact but a rough estimate is better than nothing. If you leave this parameter out altogether the scheduler will assume a default runtime.

In this next example, we have requested 4 CPUs and 4 GPUs and have estimated the job execution time to be no more than 12 hours:

#!/bin/bash

#SBATCH -p pearl # partition (queue)
#SBATCH --job-name=mytestjob
#SBATCH -n 4 # number of CPU cores
#SBATCH --gres=gpu:4
#SBATCH -t 0:12:00 # time (D-HH:MM)

Note

In summary, srun is used to submit a job for execution in real time while sbatch is used to submit a job script for later execution

The salloc command¶

The salloc command reserves resources in real-time under a job allocation and then releases them when you are finished.

Any command that is run within the allocation is run in parallel across all nodes that nodes that have been allocted. In the next example, the salloc command requests two nodes and once allocated, we can check that this is the case with the srun hostname command:

$ salloc -N2 -t 0-00:05
salloc: Granted job allocation 1498529
salloc: Waiting for resource configuration
salloc: Nodes cn2g[17-18] are ready for job
cpu-bind=MASK - cn2g17, task  0  0 [67022]: mask 0x101 set

$ srun hostname
cpu-bind=MASK - cn2g17, task  0  0 [67069]: mask 0x101 set
cn2g17.scarf.rl.ac.uk
cpu-bind=MASK - cn2g18, task  1  0 [43302]: mask 0x101 set
cn2g18.scarf.rl.ac.uk

Controlling your jobs¶

Monitoring jobs¶

Once you have submitted your job you can monitor its status with this command:

$ squeue -u <username>

The following example tells us that the job has been accepted by the scheduler. The PD field shows that the job is in a pending state and the NODELIST field tells us that this is because the scheduler is waiting for the requested resources to become available.

pearl004@ui:~/jobs $ squeue -u pearl004
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
              2112     pearl   syrinx pearl004 PD       0:00      1 (Resources)

The output for a running job would be:

pearl004@ui:~/jobs $ squeue -u pearl004
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
              2112     pearl   syrinx pearl004  R      20:33      1 mn1

If you need a more detailed summary of a running job you can use the following command:

$ scontrol show jobid -dd <job_id>

Holding, releasing and requeuing jobs¶

If you need to hold a pending job from being run you can do so using:

$ scontrol hold <job_id>

To release a job that is currently being held:

$ scontrol release <job_id>

To requeue a running, suspended or completed job:

$ scontrol requeue <job_id>

Cancelling jobs¶

If you need to cancel a job (either queued or already running) you use the scancel command:

To cancel a job using the job’s unique ID number:

$ scancel <job_id>

To cancel a job by name:

$ scancel <job_name>

To cancel all your pending jobs:

$ scancel -t PENDING -u <username>

To cancel all of your jobs, regardless of their state:

$ scancel -u <username>

When is my job going to run?¶

If your job is queued and you want to know roughly when it will start, use the following command:

$ squeue --start -j <jobid>