Interactive Job
The interactive job allows you to stay connected to the job so as to interact with it, typically for a Jupyter Lab/Notebook/Python interpreter. This solution is to be preferred in the development phase or when treatments require a graphical output to be consulted in real time so as to stop and restart the treatments with another configuration (for example: experimental training). Once the job is submitted, you are connected to a machine that meets your request of resources: one of the hal[1-6] machines.
Run from hal0:
srun
--gpus=<ampere or turing>:<1 or 2> \ # Request use of GPU cards. GPU architecture is defaulting to Slurm. Default GPU number is 0.
--mail-user=<email> \ # Requests email notifications. Default is none.
--nodelist=<nodename> \ # Select a particular node: hal[1-6]. Not recommended: better defaulting to Slurm.
--partition=interactive \ # Must be interactive for interactive job.
--time '1:00:00' \ # Specify the maximum elapsed time expressed in hour. Default is 1h.
--pty bash # Request an interactive job with the Bash shell. Must be the last option.
Info
The limits and default values for the job specification are described at this page.
An example of submitting an interactive job allocated on the interactive partition, limited to 1 hour of computation, with a single GPU card (Turing or Ampere, choice is left to Slurm), executed on one node whose choice is left to Slurm, without graphical feedback, without email notifications. Run from hal0:
Info
The RAM and CPU resources of a node are shared between jobs executed on the node.
Once connected to one of the hal[1-6] machines, you can run the scripts and binaries in the shell that is presented to you. Exit the shell to end the interactive job (run exit
command or hit CTRL + D).
Warning
The default maximum elapsed time of computation is 1h (when not specifying the --time option). It can be specified up to 10h for a interactive job. Slurm kills the jobs that exceed that limit. Read this page for more information.
Info
If your processing requires loading AI modules, read the following page.
Example of loading an AI module, from hal[1-6], after submitting an interactive job:
Info
For the extension of a module, e.g. installation of missing packages or in particular versions, it is possible to create Python virtual environments on top of modules. Read this page for more details.
Tips
Slurm, the HAL cluster's job manager, offers you an option to choose the GPU architecture where your code will be executed. The --gpus=<ampere or turing>:<1 or 2>
option for the srun
command. e.g. --gpus=turing:1
so as to allocate one Nvidia® GeForce® RTX 2080 Ti GPU cards. Run squeue
and sinfo
so as to get the availability of the cluster nodes. Note that if the GPU architecture is not specified, Slurm chooses randomly between Turing and Ampere.