Slurm Job Container Plugin API

Overview

This document describes Slurm job container plugins and the API that defines them. It is intended as a resource to programmers wishing to write their own Slurm job container plugins. Note that job container plugin is designed for use with Slurm jobs. It also applies to the sbcast server process on compute nodes. There is a proctrack plugin designed for use with Slurm job steps. This is version 101 of the API.

Slurm job container plugins are Slurm plugins that implement the Slurm job container API described herein. They must conform to the Slurm Plugin API with the following specifications:

const char plugin_type[]
The major type must be "job_container." The minor type can be any recognizable abbreviation for the type of proctrack. We recommend, for example:

  • cncu—Designed for use on Cray systems only and interface with Compute Node Clean Up (CNCU) the Cray infrastructure.
  • none—Designed for all other systems.

The plugin_name and plugin_version symbols required by the Slurm Plugin API require no specialization for process tracking. Note carefully, however, the versioning discussion below.

The programmer is urged to study src/plugins/proctrack/job_container/job_container_cncu.c for an example implementation of a Slurm proctrack plugin.

Data Objects

The implementation must support a container ID of type uint64_t. This container ID is generated by the proctrack plugin.

The implementation must maintain (though not necessarily directly export) an enumerated errno to allow Slurm to discover as practically as possible the reason for any failed API call. These values must not be used as return values in integer-valued functions in the API. The proper error return value from integer-valued functions is Slurm_ERROR. The implementation should endeavor to provide useful and pertinent information by whatever means is practical. Successful API calls are not required to reset errno to a known value.

API Functions

The following functions must appear. Functions which are not implemented should be stubbed.

int init (void)

Description:
Called when the plugin is loaded, before any other functions are called. Put global initialization here.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

void fini (void)

Description:
Called when the plugin is removed. Clear any allocated storage here.

Returns: None.

Note: These init and fini functions are not the same as those described in the dlopen (3) system library. The C run-time system co-opts those symbols for its own initialization. The system _init() is called before the SLURM init(), and the SLURM fini() is called before the system's _fini().

int container_p_create (uint32_t job_id);

Description: Create a container. The caller should insure that be valid container_p_delete() is called.

Argument: job_id    (input) Job ID.

Returns: Slurm_SUCCESS if successful. On failure, the plugin should return Slurm_ERROR and set the errno to an appropriate value to indicate the reason for failure.

int container_p_add_cont (uint32_t job_id, uint64_t cont_id);

Description: Add a specific process tracking container (PAGG) to a given job's container.

Arguments:
job_id    (input) Job ID.
cont_id    (input) Process tracking container value as set by the proctrack plugin.

Returns: Slurm_SUCCESS if successful. On failure, the plugin should return Slurm_ERROR and set the errno to an appropriate value to indicate the reason for failure.

int container_p_add_pid (uint32_t job_id, pid_t pid, uid_t uid);

Description: Add a specific process ID to a given job's container. The process is first placed into a process tracking container (PAGG).

Arguments:
job_id    (input) Job ID.
pid    (input) Process ID.
uid    (input) Owning user ID.

Returns: Slurm_SUCCESS if successful. On failure, the plugin should return Slurm_ERROR and set the errno to an appropriate value to indicate the reason for failure.

int container_p_delete (uint32_t job_id);

Description: Destroy or otherwise invalidate a job container. This does not imply the container is empty, just that it is no longer needed.

Arguments: job_id    (input) Job ID.

Returns: Slurm_SUCCESS if successful. On failure, the plugin should return Slurm_ERROR and set the errno to an appropriate value to indicate the reason for failure.

void container_p_reconfig (void);

Description: Note change in configuration, especially the value of the DebugFlags with respect to JobContainer.

Versioning

This document describes version 101 of the Slurm job container API. Future releases of Slurm may revise this API. A job container plugin conveys its ability to implement a particular API version using the mechanism outlined for Slurm plugins.

Last modified 8 May 2014