ACCRE Commands for Job Monitoring
ACCRE staff have written a number of useful commands that are available for use on the cluster.
For monitoring jobs
rtracejob
rtracejob
is used to compare resource requests to resource usage for an individual job. Currently it can be used to display a single job information or the summary of array job information. By typing in rtracejob -h
, the help information will be displayed to explain the arguments and available functions for rtracejob
:
usage: rtracejob [-h] [-l] [--dump_failed_joblist] jobID positional arguments: jobID the slurm job ID for displaying job information. For array jobs if root job ID is given (for example, in job ID 1234567_12 the root job ID is 1234567) the summary of array jobs will be displayed. optional arguments: -h, --help show this help message and exit -l, --list_subjobs with this option rtracejob will print out all of sub jobs information automatically, in default it's off --dump_failed_joblist with this option rtracejob will dump the failed sub jobs ID to a file in name of "failed_joblist_(your jobid).txt", in default it's off
rtracejob is useful for troubleshooting when something goes wrong with your job. For example, rtracejob jobID
can be used to display information for a single job:
[bob@vmps12 ~]$ rtracejob 1234567 +------------------+--------------------------+ | User: bob | JobID: 1234567 | +------------------+--------------------------+ | Account | chemistry | | Job Name | python.slurm | | State | Completed | | Exit Code | 0:0 | | Wall Time | 00:10:00 | | Requested Memory | 1000Mc | | Memory Used | 13712K | | CPUs Requested | 1 | | CPUs Used | 1 | | Nodes | 1 | | Node List | vmp505 | | Wait Time | 0.4 minutes | | Run Time | 0.4 minutes | | Submit Time | Thu Jun 18 09:23:32 2015 | | Start Time | Thu Jun 18 09:23:57 2015 | | End Time | Thu Jun 18 09:24:23 2015 | +------------------+--------------------------+ | Today's Date | Thu Jun 18 09:25:08 2015 | +------------------+--------------------------+
A user might want to check how much memory a job used compared to how much was requested, or how long it took a job to execute relative to how much wall time was requested. In this example, note the Requested Memory
reported is 1000Mc, meaning 1000 megabytes per core (the “c” stands for “core”). This is the default for jobs that specify no memory requirement. If you see a lowercase “n” on the Requested Memory
line, this stands for “node” and occurs when a --mem=
line is included in a SLURM script, which allocates the amount of memory listed per node in the allocation.
Additionally, rtracejob can also used to display the summary of an array job. For example,
[liuf8@gw342 ~]$ rtracejob 5270444 +---------------------------+-------------------------------------------+ + SUMMARY of ARRAY JOBS + + User name: liuf8 | job ID: 5270444 +---------------------------+-------------------------------------------+ + Account | accre + Job Name | array.slurm + No. of Submitted SubJobs | 5 + No. of Finished SubJobs | 5 + No. of Successful SubJobs | 5 + No. of Failed SubJobs | 0 + Requested Memory | 500mn + Max Memory Used by SubJobs| 154472k + Original Requested Time | 00:10:00 + Max Running Time | 00:00:42 + Min Running Time | 00:00:26 + Max Waiting Time | 00:00:00 + Min Waiting Time | 00:00:00 +---------------------------+-------------------------------------------+
The job ID 5270444
is the root job ID, and 5270444_1
, 5270444_2
are the sub jobs that belong to this array job. You can also pass either sub job ID directly to the rtracejob
command, or get information about all sub jobs by passing the --list_subjobs
flag (e.g. rtracejob 5270444 --list_subjobs
). By providing the root job ID, rtracejob
will scan all of the sub jobs’ information and finally print a summary of the given array job. Please open a helpdesk ticket if you would like us to consider adding additional features to the rtracejob
command.
qSummary
qSummary
provides an alternate summary of jobs and cores running across all groups in the cluster. It is possible to filter the results by selecting a specific account through the -g
option.
[jill@vmps12 ~]$ qSummary
GROUP USER ACTIVE_JOBS ACTIVE_CORES PENDING_JOBS PENDING_CORES
-----------------------------------------------------------------------------
science 18 34 5 7
jack 5 5 4 4
jill 13 29 1 3
-----------------------------------------------------------------------------
economics 88 200 100 100
emily 88 200 100 100
-----------------------------------------------------------------------------
Totals: 106 234 105 107
As shown, the output from qSummary
provides a basic view of the active and pending jobs and cores across groups and users within a group. qSummary also supports a -g
argument followed by the name of a group, a -p
argument followed by the partition name, and a -gpu
switch if you like to see GPU rather than CPU info. For example:
[jill@vmps12 ~]$ qSummary -p pascal -gpu
GROUP USER ACTIVE_JOBS ACTIVE_GPUS PENDING_JOBS PENDING_GPUS
-----------------------------------------------------------------------------
science 4 8 1 2
jack 0 0 1 2
jill 4 8 0 0
-----------------------------------------------------------------------------
economics 4 16 0 0
emily 4 16 0 0
-----------------------------------------------------------------------------
Totals: 8 24 1 2
For monitoring groups
slurm_groups
To list your current SLURM group membership, type slurm_groups
.
This command will list all your groups, what partition(s) they belong to, and whether or not GPU access is enabled for your account. If you do have GPU access, it will return an example for your SLURM script.
$ slurm_groups Accounts Partitions --------------- ------------ accre debug accre nogpfs accre production accre_gpu_acc maxwell accre_gpu_acc pascal sc3260 debug sc3260 production sc3260_acc maxwell sc3260_acc pascal You have access to accelerated GPU resources. As a usage example, if you wanted to request 2 GPUs for a job with account "sc3260_acc" on the partition "pascal", then you would add the following lines to your SLURM script: #SBATCH --account=sc3260_acc #SBATCH --partition=pascal #SBATCH --gres=gpu:2
showLimits
As the name suggests, showLimits
will display the resource limits imposed on accounts and groups on the cluster. Running the command without any arguments will list all accounts and groups on the cluster. Optionally, showLimits
also accepts a -g
argument followed by the name of a group or account. For example, to see a list of resource limits imposed on an account named science_account
(this account does not actually exist on the cluster):
[jill@vmps12 ~]$ showLimits -g science_account
ACCOUNT GROUP FAIRSHARE MAXCPUS MAXMEM(GB) MAXCPUTIME(HRS)
-----------------------------------------------------------------------------
science_account 12 3600 2400 23040
biology 1 2400 1800 -
chemistry 1 800 600 -
physics 1 600 600 8640
science 1 - 2200 20000
-----------------------------------------------------------------------------
Limits are always imposed on the account level, and occasionally on the group level when multiple groups fall under a single account. If a particular limit is not defined on the group level, the group is allowed access to the entire limit under its parent account. For example, the science
group does not have a MAXCPUS limit defined, and therefore can run across a maximum of 3600 cores so long as no other groups under science_account
are running and no other limits (MAXMEM
or MAXCPUTIME
) are exceeded.
We leave FAIRSHARE
defined on the account level only, so groups within the same account do not receive elevated priority relative to one another. The value 1 for FAIRSHARE defined at the group level means that all groups under the account receive equal relative priority.
For monitoring SLURM status
SlurmActive
SlurmActive
displays a concise summary of the percentage of CPU cores and nodes currently allocated to jobs, and the number of memory-starved CPU cores on the cluster. For GPU accelerated nodes it will show the number of allocated GPUs.
[bob@vmps12 ~]$ SlurmActive
Standard Nodes Info: 564 of 567 nodes active ( 99.47%)
5744 of 7188 processors in use by local jobs ( 79.91%)
253 of 7188 processors are memory-starved ( 3.52%)
1191 of 7188 available processors ( 16.57%)
GPU Nodes Info: Pascal: 18 of 47 GPUs in use ( 38.30%)
Maxwell: 18 of 48 GPUs in use ( 37.50%)
Phi Nodes Info: 0 of 0 nodes active ( 0.00%)
0 of 0 processors in use by local jobs ( 0.00%)
0 of 0 processors are memory-starved ( 0.00%)
ACCRE Cluster Totals: 576 of 591 nodes active ( 97.46%)
5813 of 7428 processors in use by local jobs ( 78.26%)
253 of 7428 processors are memory-starved ( 3.41%)
1362 of 7428 available processors ( 18.34%)
2387 running jobs, 2519 pending jobs
, 7 jobs in unrecognized state
Multiple sections are reported. In general, the Standard Node Info section is the one users are most interested in, as this corresponds to the default production
partition on the ACCRE cluster. GPU Node Info provides information about the availability of GPU nodes on the cluster, while the Phi Node Info section provides details about the availability of the Intel Xeon Phi nodes.
SlurmActive
also reports the number of memory-starved cores in each section. A core is considered memory-starved if it is available for jobs but does not have access to at least 1GB of RAM (by default, jobs are allocated 1GB RAM per core). Requesting less than 1GB of RAM per core may provide access to these cores. Note that SlurmActive
accepts a -m
option followed by the amount of RAM (in GB) if you would like to compute memory-starved cores on the basis of another memory value. For example, SlurmActive -m 2
will report cores as being memory-starved if they do not have access to at least 2GB of RAM.