SLURM: Scheduling and Managing Jobs
SLURM (Simple Linux Utility for Resource Management) is a software package for submitting, scheduling, and monitoring jobs on large compute clusters. This page details how to use SLURM for submitting and monitoring jobs on ACCRE’s Vampire cluster. New cluster users should consult our Getting Started pages, which is designed to walk you through the process of creating a job script, submitting a job to the cluster, monitoring jobs, checking job usage statistics, and understanding our cluster policies. SLURM has been in use for job scheduling since early 2015; previously Torque and Moab were used for that purpose.
This page describes the basic commands of SLURM. For more advanced topics, see the page on GPUs, Parallel Processing and Job Arrays. ACCRE staff have also created a number of utilities to assist you in scheduling and managing your jobs.
All the examples on this page can be downloaded from ACCRE’s Github page by issuing the following commands from a cluster gateway:
module load GCC git
git clone https://github.com/accre/SLURM.git
For a printable list of SLURM commands, download the ACCRE Cheat Sheet. SchedMD, the creators of SLURM, have a printable reference as well.
Batch Scripts
The first step for submitting a job to SLURM is to write a batch script, as shown below. The script includes a number of #SBATCH directive lines that tell SLURM details about your job, including the resource requirements for your job. For example, the example below is a simple Python job requesting 1 node, 1 CPU core, 500 MB of RAM, and 2 hours of wall time. Note that specifying the node (#SBATCH --nodes=1
) and CPU core ( #SBATCH --ntasks=1
) count must be broken off into two lines in SLURM.
#!/bin/bash #SBATCH --mail-user=myemail@vanderbilt.edu #SBATCH --mail-type=ALL #SBATCH --nodes=1 # comments allowed #SBATCH --ntasks=1 #SBATCH --time=00:10:00 #SBATCH --mem=500M #SBATCH --output=python_job_slurm.out # These are comment lines # Load the Anaconda distribution of Python, which comes # pre-bundled with many of the popular scientific computing tools like # numpy, scipy, pandas, scikit-learn, etc. module load Anaconda2 # Pass your Python script to the Anaconda2 python intepreter for execution python vectorization.py
Note that a SLURM batch script must begin with the #!/bin/bash
directive on the first line. The subsequent lines begin with the SLURM directive #SBATCH
followed by a resource request or other pertinent job information. Email alerts will be sent to the specified address when the job begins, aborts, and ends. Below the #SBATCH directives are the Linux commands needed to run your program or analysis. Once your job has been submitted via the sbatch command (details shown below), SLURM will match your resource requests with idle resources on the cluster, run your specified commands on one or more compute nodes, and then email you (if requested in your batch script) when your job begins, ends, and/or fails.
Here is a list of basic #SBATCH directives:
#SBATCH Directive | Description |
---|---|
--nodes=[count] | Node count |
--tasks-per-node=[count] | Processes per node |
--ntasks=[count] | Total processes (across all nodes) |
--cpus-per-task=[count] | CPU cores per process |
--nodelist=[nodes] | Job host preference |
--exclude=[nodes] | Job host to avoid |
--time=[min] or --time=[dd-hh:mm:ss] | Wall clock limit |
--mem=[count] | RAM per node |
--mem-per-cpu=[count][M or G] | RAM per CPU core |
--output=[file_name] | Standard output file |
--error=[file_name] | Standard error file |
--array=[array_spec] | Launch job array |
--mail-user=[email_address] | Email for job alerts |
--mail-type=[BEGIN or END or FAIL or REQUEUE or ALL] | Email alert type |
--account=[account] | Account to charge |
--depend=[state:job_id] | Job dependency |
--job-name=[name] | Job name |
--constraint=[attribute] | Request node attribute (westmere, sandy_bridge, haswell, eight, twelve, sixteen) |
--partition=[name] | Submit job to specified partition (production (default), debug, maxwell, fermi) |
Note that the --constraint
option allows a user to target certain processor families.
Partitions (Queues)
All non-GPU groups on the cluster have access to the production
and debug
partitions. The purpose of the debug
partition is to allow users to quickly test a representative job before submitting a larger number of jobs to the production partition (which is the default partition on our cluster). Wall time limits and other policies for each of our partitions are shown below.
Partition | Max Wall Time | Max Running Jobs | Max Submitted Jobs | Resources |
---|---|---|---|---|
production | 14 days | n/a | n/a | 6000-6500 CPU cores |
debug | 30 minutes | 2 | 5 | 8 CPU cores |
pascal | 5 days | n/a | n/a | 80 CPU cores, 40 Maxwell GPUs |
maxwell | 5 days | n/a | n/a | 144 CPU cores, 48 Maxwell GPUs |
Commands
SLURM offers a number of helpful commands for tasks ranging from job submission and monitoring to modifying resource requests for jobs that have already been submitted to the queue. Below is a list of SLURM commands:
SLURM | Function |
---|---|
sbatch [job_script] | Job submission |
squeue | Job/Queue status |
scancel [JOB_ID] | Job deletion |
scontrol hold [JOB_ID] | Job hold |
scontrol release [JOB_ID] | Job release |
sinfo | Cluster status |
salloc | Launch interactive job |
srun [command] | Launch (parallel) job step |
sacct | Displays job accounting information |
sbatch
The sbatch
command is used for submitting jobs to the cluster. sbatch
accepts a number of options either from the command line, or (more typically) from a batch script. An example of a SLURM batch script (called simple.slurm
) is shown below:
#!/bin/bash
#SBATCH --mail-user=myemail@vanderbilt.edu
#SBATCH --mail-type=ALL
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=1G
#SBATCH --time=0-00:15:00 # 15 minutes
#SBATCH --output=my.stdout
#SBATCH --job-name=just_a_test
# Put commands for executing job below this line
# This example is loading the Anaconda distribution of Python and
# writing out the version of Python
module load Anaconda2
python --version
To submit this batch script, a user would type:
sbatch simple.slurm
This job (called just_a_test
) requests 1 compute node, 1 task (by default, SLURM will assign 1 CPU core per task), 1 GB of RAM per CPU core, and 15 minutes of wall time (the time required for the job to complete). Note that these are the defaults for any job, but it is good practice to include these lines in a SLURM script in case you need to request additional resources.
Optionally, any #SBATCH
line may be replaced with an equivalent command-line option. For instance, the #SBATCH --ntasks=1
line could be removed and a user could specify this option from the command line using:
sbatch --ntasks=1 simple.slurm
The commands needed to execute a program must be included beneath all #SBATCH
commands. Lines beginning with the #
symbol (without /bin/bash or SBATCH) are comment lines that are not executed by the shell. The example above simply prints the version of Python loaded in a user’s path. It is good practice to include any module load commands in your SLURM script. A real job would likely do something more complex than the example above, such as read in a Python file for processing by the Python interpreter.
For more information about sbatch
see: http://slurm.schedmd.com/sbatch.html
squeue
squeue
is used for viewing the status of jobs. By default, squeue
will output the following information about currently running jobs and jobs waiting in the queue: Job ID, Partition, Job Name, User Name, Job Status, Run Time, Node Count, and Node List. There are a large number of command-line options available for customizing the information provided by squeue
. Below are a list of examples:
Command | Meaning |
---|---|
squeue --long | Provide more job information |
squeue --user=USER_ID | Provide information for USER_ID’s jobs |
squeue --account=ACCOUNT_ID | Provide information for jobs running under ACCOUNT_ID |
squeue --states=running | Show running jobs only |
squeue --format=account,username,numcpus,state,timeleft | Customize output of squeue |
squeue --start | List estimated start time for queued jobs |
squeue --help | Show all options |
For more information about squeue
see: http://slurm.schedmd.com/squeue.html
sacct
This command is used for viewing information for completed jobs. This can be useful for monitoring job progress or diagnosing problems that occurred during job execution. By default, sacct
will report Job ID, Job Name, Partition, Account, Allocated CPU Cores, Job State, and Exit Code for all of the current user’s jobs that completed since midnight of the current day. Many options are available for modifying the information output by sacct
:
Command | Meaning |
---|---|
sacct --starttime 12.04.14 | Show information since midnight of Dec 4, 2014 |
sacct --allusers | Show information for all users |
sacct --accounts=ACCOUNT_ID | Show information for all users under ACCOUNT_ID |
sacct --format="JobID,user,account,elapsed, Timelimit,MaxRSS,ReqMem,MaxVMSize,ncpus,ExitCode" | Show listed job information |
sacct --help | Show all options |
The --format
option is particularly useful, as it allows a user to customize output of job usage statistics. We would suggest create an alias for running a customized version of sacct
. For instance, the elapsed
and Timelimit
arguments allow for a comparison of allocated vs. actual wall time. MaxRSS
and MaxVMSize
shows maximum RAM and virtual memory usage information for a job, respectively, while ReqMem
reports the amount of RAM requested.
For more information about sacct
see: http://slurm.schedmd.com/sacct.html
scontrol
scontrol
is used for monitoring and modifying queued jobs, as well as holding and releasing jobs. One of its most powerful options is the scontrol show job
option. Below is a list of useful scontrol
commands:
Command | Meaning |
---|---|
scontrol show job JOB_ID | Show information for queued or running job |
scontrol hold JOB_ID | Place hold on job |
scontrol release JOB_ID | Release hold on job |
scontrol show nodes | Show hardware details for nodes on cluster |
scontrol update JobID=JOB_ID Timelimit=1-12:00:00 | Change wall time to 1 day 12 hours |
scontrol update dependency=JOB_ID | Add job dependency so that job only starts after JOB_ID completes |
scontrol --help | Show all options |
Please note that the time limit or memory of a job can only be adjust for pending jobs, not for running jobs.
For more information about scontrol
see: http://slurm.schedmd.com/scontrol.html
salloc
The function of salloc
is to launch an interactive job on compute nodes. This can be useful for troubleshooting/debugging a program or if a program requires user input. To launch an interactive job requesting 1 node, 2 CPU cores, and 1 hour of wall time, a user would type:
salloc --nodes=1 --ntasks=2 --time=1:00:00
This command will execute and then wait for the allocation to be obtained. Once the allocation is granted, an interactive shell is initiated on the allocated node (or one of the allocated nodes, if multiple nodes were allocated). At this point, a user can execute normal commands and launch his/her application like normal.
Note that all of the sbatch
options are also applicable for salloc
, so a user can insert other typical resource requests, such as memory. Another useful feature in salloc
is that it enforces resource requests to prevent users or applications from using more resources than were requested. For example:
[bob@vmps12 ~]$ salloc --nodes=1 --ntasks=2 --time=1:00:00
salloc: Pending job allocation 1772833
salloc: job 1772833 queued and waiting for resources
salloc: job 1772833 has been allocated resources
salloc: Granted job allocation 1772833
[bob@vmp586 ~]$ hostname
vmp586
[bob@vmp586 ~]$ srun -n 2 hostname
vmp586
vmp586
[bob@vmp586 ~]$ srun -n 4 hostname
srun: error: Unable to create job step: More processors requested than permitted
[bob@vmp586 ~]$ exit
exit
srun: error: vmp586: task 0: Exited with exit code 1
salloc: Relinquishing job allocation 1772833
salloc: Job allocation 1772833 has been revoked.
[bob@vmps12 ~]$
In this example, srun -n 4
failed because only 2 tasks were allocated for this interactive job (for details on srun
see Section 3.9 below). Also note that typing exit
during the interactive session will kill the interactive job, even if the allotted wall time has not been reached.
For more information about salloc
see: http://slurm.schedmd.com/salloc.html
xalloc
Similarly to salloc
, this command provides an interactive shell on a compute node but with the possibility of running programs with a graphical user interface (GUI) directly on the compute node. To correctly visualize the GUI on your monitor, you first need to connect to the cluster’s gateway with the X11 forwarding abilitated as follows:
[bob@bobslaptop ~]$ ssh -X bob@login.accre.vanderbilt.edu
Then from the gateway request the interactive job with X11 forwarding as in the following example:
[bob@vmps12 ~]$ xalloc --nodes=1 --ntasks=2 --time=1:00:00
srun: job 12555243 queued and waiting for resources
srun: job 12555243 has been allocated resources
[bob@vmp586 ~]$
At this point when launching a GUI based software, the interface should appear on your monitor.
sinfo
sinfo
allows users to view information about SLURM nodes and partitions. A partition is a set of nodes (usually a cluster) defined by the cluster administrator. Below are a few example uses of sinfo
:
Command | Meaning |
---|---|
sinfo --Nel | Displays info in a node-oriented format |
sinfo --partition=gpu | Get information about GPU nodes |
sinfo --states=IDLE | Displays info about idle nodes |
sinfo --help | Show all options |
For more information about sinfo
see: http://slurm.schedmd.com/sinfo.html
sreport
sreport
is used for generating reports of job usage and cluster utilization. It queries the SLURM database to obtain this information. By default information will be shown for jobs run since midnight of the current day. Some examples:
Command | Meaning |
---|---|
sreport cluster utilization | Show cluster utilization report |
sreport user top | Show top 10 cluster users based on total CPU time |
sreport cluster AccountUtilizationByUser start=2014-12-01 | Show account usage per user dating back to December 1, 2014 |
sreport job sizesbyaccount PrintJobCount | Show number of jobs run on a per-group basis |
sreport --help | Show all options |
For more information about sreport
see: http://slurm.schedmd.com/sreport.html
srun
Finally, srun
is used to create job arrays for parallel processing. More information about srun
is available in GPUs, Parallel Processing and Job Arrays.
Environmental Variables
Variable | Meaning |
---|---|
SLURM_JOBID | Job ID |
SLURM_SUBMIT_DIR | Job submission directory |
SLURM_SUBMIT_HOST | Name of host from which job was submitted |
SLURM_JOB_NODELIST | Names of nodes allocated to job |
SLURM_ARRAY_TASK_ID | Task id within job array |
SLURM_JOB_CPUS_PER_NODE | CPU cores per node allocated to job |
SLURM_NNODES | Number of nodes allocated to job |
Each of these environment variables can be referenced from a SLURM batch script using the $
symbol before the name of the variable (e.g. echo $SLURM_JOBID
). A full list of SLURM environment variables can be found here: http://slurm.schedmd.com/sbatch.html#lbAF