Dynamic Resource Allocation (dynalloc)
Overview
This document describes SLURM resource dynamic allocation (dynalloc) and the manual how to enable it.
SLURM dynamic resource allocation (dynalloc) works as a optional running thread when SLURM's control daemon (slurmctld) starts up. After spawned, the dynalloc thread runs as a socket server to accept requests such as resource query, allocation, and deallocation. After receiving such requests, the dynalloc parses message, performs actions, and then responds to the request.
Configuration
To enable the dynalloc, some configurations should be made in SLURM side.
SLURM Configuration
configure
When building from the SLURM source code, add --enable-dynamic-allocation to the execute line of ./configure.
slurm.conf
After installation, set the config parameters in slurm.conf as follows:
SlurmctldPlugstack=dynalloc DynAllocPort=6820
The default value of DynAllocPort is 6820. You can chenge it if needed.
Functionalities
Resource Query
The client might send messages to query how many nodes and slots either in total or available in SLURM.
Get Total Nodes and Slots
the request message from client: "get total nodes and slots"
the response message could be like: "total_nodes=4 total_slots=16"
Get Available Nodes and Slots
the request message from client: "get available nodes and slots"
the response message could be like: "avail_nodes=4 avail_slots=16"
Resource Allocation
The client might send request to SLURM for allocating resources.
An allocation request message will consist of two part:
- Job part, like "jobid=100 return=all timeout=10"
- App part, like "app=0 np=5 N=2 node_list=vm[2-3] flag=mandatory cpu_bind=cores mem_per_cpu=100 resv_port_cnt=2"
An allocation message might consist of one job part and at least one
app (application) part.
For example:
"allocate jobid=100 return=all timeout=10:app=0 np=5 N=2 node_list=vm[2-3]
flag=mandatory cpu_bind=cores mem_per_cpu=100 resv_port_cnt=2:app=1 N=2"
In the job part of the above message, jobid is optional and will be sent back to client for identifying the allocation results; the return flag is also optional, if the return flag ("return=all") is specified, all app's allocation result will be sent back in ONE message, like "jobid=100:app=0 slurm_jobid=679 allocated_node_list=vm2,vm3 tasks_per_node=3,2:app=1 slurm_jobid=680 allocated_node_list=vm4,vm5 tasks_per_node=4(x2)". Otherwise, the allocation result of each app will be sent back respectively, like, msg-1) "jobid=100:app=0 slurm_jobid=681 allocated_node_list=vm2,vm3 tasks_per_node=3,2" , and msg-2) "jobid=100:app=1 slurm_jobid=682 allocated_node_list=vm4,vm5 tasks_per_node=4(x2). timeout (in sec.) is the time interval during which the client will wait for the allocation response.
In the app part of the above message, app is the application/task id which will be sent back to client for identifying the allocation result; np is the number of process will run, namely the number of slots should be allocated for this app; N is the number of nodes should be allocated; node_list is the node pool from which to select nodes; flag is the allocation requirement which can be "mandatory" or "optional", if "flag=mandatory", all requested nodes must be allocated from node_list; else if "flag=optional", try best to allocate node from node_list, and the allocation should include all nodes in the given list that are currently available, if that is not enough to meet the requested node number N, then take any other nodes that are available to fill out the requested number. cpu_bind is to bind tasks to CPUs, which is used only when the task/affinity or task/cgroup plugin is enabled (please refer to 'man salloc'). mem_per_cpu is mimimum memory required per allocated CPU in MegaBytes, which is used when the task/cgroup plugin is enabled. resv_port_cnt is the port count required to be allocated, if not specified, resv_port_cnt=1 by default.
A response message might be like "jobid=100:app=0 slurm_jobid=679 allocated_node_list=vm2,vm3 tasks_per_node=3,2 resv_ports=12001-12002:app=1 allocate_failure". In this example, 'app=0' gets a successful allocation while the allocation for 'app=1' fails. Note that in the response message with successful allocation for an app, a slurm_jobid is returned for later operation, e.g., process launch, resource deallocation, etc.
Resource Deallocation
After job execution, the client might release resources to SLURM.
A resource deallocation request message from client can be like: "deallocate slurm_jobid=744 job_return_code=0:slurm_jobid=745 job_return_code=-1". Note that it is possible to release a number of allocations in ONE message, and each allocation is labeled by a slurm_jobid. All resources related with the slurm_job will be released, e.g., cores/nodes, memory, and ports.
Last modified 19 February 2013