Job Array Support
Overview
Support for job arrays was added in Slurm version 2.6.
Job arrays offer a mechanism for submitting and managing collections of similar
jobs quickly and easily; tens of thousands of jobs can be submitted in under
one second.
All jobs must have the same initial options (e.g. size, time limit, etc.),
however it is possible to change some of these options after the job has begun
execution using the command specifying the JobID of the array or individual
ArrayJobID.
scontrol update job=101 ...
Job arrays are only supported for batch jobs and the array index values are specified using the --array or -a option of the sbatch command. The option argument can be specific array index values, a range of index values, and an optional step size as shown in the examples below. Note the the minimum index value is zero and the maximum value a Slurm configuration parameter (MaxArraySize minus one). Jobs which are part of a job array will have the environment variable SLURM_ARRAY_TASK_ID set to its array index value.
# Submit a job array with index values between 0 and 31 $ sbatch --array=0-31 -N1 tmp # Submit a job array with index values of 1, 3, 5 and 7 $ sbatch --array=1,3,5,7 -N1 tmp # Submit a job array with index values between 1 and 7 # with a step size of 2 (i.e. 1, 3, 5 and 7) $ sbatch --array=1-7:2 -N1 tmp
Job ID and Environment Variables
Job arrays will have two additional environment variable set.
SLURM_ARRAY_JOB_ID will be set to the first job ID of the array.
SLURM_ARRAY_TASK_ID will be set to the job array index value.
For example a job submission of this sort
sbatch --array=1-3 -N1 tmp
will generate a job array containing three jobs. If the sbatch command responds
Submitted batch job 36
then the environment variables will be set as
follows:
SLURM_JOBID=36
SLURM_ARRAY_JOB_ID=36
SLURM_ARRAY_TASK_ID=1
SLURM_JOBID=37
SLURM_ARRAY_JOB_ID=36
SLURM_ARRAY_TASK_ID=2
SLURM_JOBID=38
SLURM_ARRAY_JOB_ID=36
SLURM_ARRAY_TASK_ID=3
All Slurm commands and APIs recognize the SLURM_JOBID value. Some commands also recognize the SLURM_ARRAY_JOB_ID plus SLURM_ARRAY_TASK_ID values separated by an underscore as identifying an element of a job array. Using the example above, "37" or "36_2" would be equivalent ways to identify the second array element of job 36.
File Names
Two additional options are available to specify a job's stdin, stdout, and
stderr file names:
%A will be replaced by the value of SLURM_ARRAY_JOB_ID (as defined above)
and
%a will be replaced by the value of SLURM_ARRAY_TASK_ID (as defined above).
The default output file format for a job array is "slurm-%A_%a.out".
An example of explicit use of the formatting is:
sbatch -o slurm-%A_%a.out --array=1-3 -N1 tmp
which would generated
output files names of this sort "slurm-36_1.out", "slurm-36_2.out" and
"slurm-36_3.out".
If these file name options are used without being part of a job array then
"%A" will be replaced by the current job ID and "%a" will be replaced by
4,294,967,294 (equivalent to 0xfffffffe or NO_VAL).
Scancel Command Use
If the job ID of a job array is specified as input to the scancel command then all elements of that job array will be cancelled. Alternately a array ID, optionally using regular expressions may be specified for job cancellation.
# Cancel array ID 1 to 3 from job array 20 $ scancel 20_[1-3] # Cancel array ID 4 and 5 from job array 20 $ scancel 20_4 20_5 # Cancel all elements from job array 20 $ scancel 20 # Cancel the current job or job array element (if job array) if [[-z $SLURM_ARRAY_JOB_ID]]; then scancel $SLURM_JOB_ID else scancel ${SLURM_ARRAY_JOB_ID}_${SLURM_ARRAY_TASK_ID} fi
Squeue Command Use
By default, the squeue command will combine all pending elements of a job array on one line and use a regular expression to indicate the "array_task_id" values as shown below.
$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 1080_[5-1024] debug tmp mac PD 0:00 1 (Resources) 1080_1 debug tmp mac R 0:17 1 tux0 1080_2 debug tmp mac R 0:16 1 tux1 1080_3 debug tmp mac R 0:03 1 tux2 1080_4 debug tmp mac R 0:03 1 tux3
An option of "--array" or "-r" has also been added to the squeue command to print one job array element per line as shown below. The environment variable "SQUEUE_ARRAY" is equivalent to including the "--array" option on the squeue command line.
$ squeue -r JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 1082_3 debug tmp mac PD 0:00 1 (Resources) 1082_4 debug tmp mac PD 0:00 1 (Priority) 1080 debug tmp mac R 0:17 1 tux0 1081 debug tmp mac R 0:16 1 tux1 1082_1 debug tmp mac R 0:03 1 tux2 1082_2 debug tmp mac R 0:03 1 tux3
The squeue --step/-s and --job/-j options can accept job or step specifications of the same format.
$ squeue -j 1234_2,1234_3 ... $ squeue -s 1234_2.0,1234_3.0 ...
Two additional job output format field options have been added to squeue:
%F prints the array_job_id value
%K prints the array_task_id value
(all of the obvious letters to use were already assigned to other job fields).
Scontrol Command Use
When a job array is submitted using the sbatch command an independent job is submitted for each element of the array, however substantial performance improvement is realized through the use of a single job submit request and only needing to validate the request options one time. Use of the scontrol show job option shows two new fields related to job array support. The JobID is a unique identifier for the job. The ArrayJobID is the JobID of the first element of the job array. The ArrayTaskID is the array index of this particular entry. Neither field is displayed if the job is not part of a job array. The optional job ID specified with the scontrol show job or scontrol show step commands can identify job array elements by using specifying two numbers with an underscore between the two: "<job_id>_<array_id>".
In order to modify a job always ArrayJobID specification, arrays can only be modified one element at the time. The scontrol command will accept a job array element specification for the update job command, but only operate on one job (or job array element).
->sbatch --array=1-4 -J Array ./sleepme 86400 Submitted batch job 21845 ->squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST 21845_1 canopo Array david R 0:13 1 dario 21845_2 canopo Array david R 0:13 1 dario 21845_3 canopo Array david R 0:13 1 dario 21845_4 canopo Array david R 0:13 1 dario ->scontrol update job=21845_2 name=arturo ->squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST 21845_1 canopo Array david R 17:03 1 dario 21845_2 canopo arturo david R 17:03 1 dario 21845_3 canopo Array david R 17:03 1 dario 21845_4 canopo Array david R 17:03 1 dario
The scontrol hold, holdu, release, requeue, requeuehold, suspend and resume commands will operate on all elements of a job array or individual elements as shown below.
->scontrol suspend 21845 ->squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST 21845_1 canopo Array david S 25:12 1 dario 21845_2 canopo arturo david S 25:12 1 dario 21845_3 canopo Array david S 25:12 1 dario 21845_4 canopo Array david S 25:12 1 dario ->scontrol resume 21845 ->squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST 21845_1 canopo Array david R 25:14 1 dario 21845_2 canopo arturo david R 25:14 1 dario 21845_3 canopo Array david R 25:14 1 dario 21845_4 canopo Array david R 25:14 1 dario scontrol suspend 21845_3 ->squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST 21845_1 canopo Array david R 25:14 1 dario 21845_2 canopo arturo david R 25:14 1 dario 21845_3 canopo Array david S 25:14 1 dario 21845_4 canopo Array david R 25:14 1 dario scontrol resume 21845_3 ->squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST 21845_1 canopo Array david R 25:14 1 dario 21845_2 canopo arturo david R 25:14 1 dario 21845_3 canopo Array david R 25:14 1 dario 21845_4 canopo Array david R 25:14 1 dario
Other Command Use
Job dependencies for individual job array elements are supported in Slurm version 2.6.4 and later. A job which is to be dependent upon an entire job array, should specify itself dependent upon the ArrayJobID. Since each array element can have a different exit code, the interpretation of the afterok and afternotok clauses will be based upon the last element of the job array to exit. Examples of use follow:
# Wait for specific job array elements sbatch --depend=after:123_4 my.job sbatch --depend=afterok:123_4:123_8 my.job2 # Wait for entire job array to complette sbatch --depend=after:123 my.job
The following Slurm commands do not currently recognize job arrays and their use requires the use of Slurm job IDs, which are unique for each array element: sacct, sbcast, smap, sreport, sshare, sstat, strigger, and sview. The sattach, sprio and sstat commands have been modified to permit specification of either job IDs or job array elements. The sview command has been modified to permit display of a job's ArrayJobId and ArrayTaskId fields. Both fields are displayed with a value of "N/A" if the job is not part of a job array.
System Administration
A new configuration parameter has been added to control the maximum job array size: MaxArraySize. The smallest index that can be specified by a user is zero and the maximum index is MaxArraySize minus one. The default value of MaxArraySize is 1001. Be mindful about the value of MaxArraySize as job arrays offer an easy way for users to submit large numbers of jobs very quickly.
The sched/backfill plugin has been modified to improve performance with job arrays. Once one element of a job array is discovered to not be runable or impact the scheduling of pending jobs, the remaining elements of that job array will be quickly skipped.
Slurm support for job arrays at this time does not use a meta-job data structure, but creates a separate job record for each element of the array. Two additional fields were added to Slurm's job record for managing job arrays. The first new field is internally called "array_job_id" and is the job ID of the first job in the array. Subsequent elements of the job array will have a unique Slurm "job_id", but all will have the same "array_job_id" value. Some Slurm commands interpret the array_job_id as representing all elements of the job array, while other commands use the unique job_id assigned to each. Support for Slurm job arrays can be expected to improve in later releases. The second new field is called "array_task_id" which is the job array index value of the job array element.
Future Work
There are scalability and performance improvements possible if a job array data structure is added rather than the current logic that only adds a new field to the existing job data structure. It is not certain when that work will occur.
Last modified 28 April 2014