Presentation

SLURM Introduction

Slurm is an orchestrator to launch job on a cluster

This page serves as an overview of user commands provided by SLURM and how users should use SLURM in order to run jobs. A concise cheat sheet for SLURM can be downloaded here. A comparison of commands for different job scheduling systems can be found here.

SLURM Commands

SLURM offers a variety of user commands for all necessary actions concerning the job management. With these commands the users have a rich interface to allocate resources, query job status, control jobs, manage accounting information and to simplify their work with some utility commands. For examples how to use these command, see section SLURM Command Examples.

sinfo show information about all partitions and nodes managed by SLURM as well as about general system state. It has a wide variety of filtering, sorting, and formatting options.
squeue query the list of pending and running jobs. By default it reports the list of pending jobs sorted by priority and the list of running jobs sorted separately according to the job priority. The most relevant job states are running (R), pending (PD), completing (CG), completed (CD) and cancelled (CA). The TIME field shows the maximum execution time of the job. The NODELIST (REASON) field indicates on which nodes the job is running or the reason why the job is pending. Typical reasons for pending jobs are waiting for resources to become available (Resources) and queuing behind a job with higher priority (Priority).
sbatch submit a batch script. The script will be executed on the first node of the allocation. The working directory coincides with the working directory of the sbatch directory. Within the script one or multiple srun commands can be used to create job steps and execute parallel applications.
scancel cancel a pending or running job or job step. It can also be used to send an arbitrary signal to all processes associated with a running job or job step.
salloc request interactive jobs/allocations. When the job starts, a shell (or other program specified on the command line) is started on the allocated node. The allocation is released when the user exits the shell or cancels the job.
srun initiate parallel job steps within a job or start an interactive job.
scontrol (primarily used by the administrators) provides some functionality for the users to manage jobs or get some information about the system configuration such as nodes, partitions, jobs, and configurations.
sstat query near-realtime status information related to CPU, task, node, RSS and virtual memory for a running job.
sacct retrieve accounting information about jobs and job steps. For completed jobs sacct queries the accounting database.
sacctmgr (primarily used by the administrators) query information about accounts and other accounting information.

Local command using slurm command

check-cluster to know the cluster nodes state
slqueue : some interestings options of squeue
seff : perl script using sacct to show efficacity of an already finished job
jobreports : script to see resources used by your already finished job (Max history for this command is 5 weeks)

Allocating Resources with SLURM

A job allocation, which is a set of computing resources (nodes or cores) assigned to a user’s request for a specified amount of maximum execution time, can be created using the SLURM salloc, sbatch or srun commands. The salloc and sbatch commands make resource allocations only. The srun command launches parallel tasks and implicitly create a resource allocation if not started within one.

The usual way to allocate resources and execute a job is to write a batch script and submit it to SLURM with the sbatch command. The batch script is a shell script consisting of two parts: resources requests and job steps. Resources requests are specifications for number of nodes needed to execute the job, maximum execution time of the job etc. Job steps are user’s tasks that must be executed. The resources request and other SLURM submission options are prefixed by #SBATCH directives and must precede any executable commands in the batch script. For example:

#!/bin/bash
#SBATCH --partition=<partition name>
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=00:30:00

# job information header is automatically added by slurm 
# SAMPLE 
#=======================TASK PROLOG================================
#Job 303678 submitted on cluster spirit from spirit1.ipsl.fr
#Job 303678 submitted by xxxxx with account yyyyyyy
#Job 303678 is running with 1 nodes and 10 cores on spirit64-01
#Job 303678 memory requirements are 40000 MB per node
#Job 303678 starting at 2023/03/23 15:52:50
#==================================================================

# Begin of section with executable commands
set -e
ls -l
./my_program

Partitions

When submitting a job, users specify the maximum execution time that their job cannot exceed (srun -t or sbatch -t option or the #SBATCH --time instruction). This amount can be adjusted up to the limit allowed by the chosen partition: the execution time limit. If users do not specify any maximum execution time, the default value associated to the chosen partition is set. Similarly, there is a default partition that is set if users do not specify the partition. So the maximum execution time and the partition must consistant, according to the execution time limit of the partition.

Warning

Slurm kills the jobs that exceed its maximum execution time.