Job partitions

Understand the general information at this section.

HAL partitions

On HAL cluster, batch jobs are automatically assigned to the batch partition, whereas the interactive jobs are automatically assigned to the interactive partition.

Partition name	Default maximum execution time	execution time limit
batch	1 hour	72 hours
interactive	1 hour	10 hours

Tips

If the execution time of your Jupyter Notebook exceeds the execution time limit of the interactive partition, you can always run your notebook in batch, thank to the jupyter-nbconvert command. More information about jupyter-nbconvert at this page.

Job states

A job can be:

submitted/accepted by the job manager. The job can be either running or pending.
running: the submitted job is running on the requested resources of the cluster.
pending: the submitted job is waiting for the resources of the cluster to run, or it is waiting for other jobs (pipelining).
rejected: the specifications of the job exeeded the hardware and/or the session limits.

HAL hardware limits

Job are limited to the following hardware limits:

Node name	GPU
Hal1-6	2
Hal7-10	1

Warning

If one of these limits are exceeded, the job submission is rejected.

Warning

A job whose specifications are below the hardware limits will run on the condition that a node has the requested resources. If the nodes that can meet the request, are busy, the job will be pending (but not rejected). You may run slqueue and check-cluster and downsize the requested resources.

HAL session limits

Session limits are about limits per given user and they are complementary with hardware limits.

Warning

If one of these limits (hardware and/or session) are exceeded, the job submission is rejected.

Internal users (members of IPSL)

Considering the jobs already submitted (cumulating the resources allocated to the accepted jobs).

GPU = 2 (same node)
Node = 1
Max running job = 2 if the first job does not reach any above limits.

Warning

A job submission is rejected when it exeeds one of the above session limits. If one of the session limits is reached, the next job submission will be pending.

Warning

A job whose specifications are below the session limits will run on the condition that a node has the requested resources. If the nodes that can meet the request, are busy, the job will be pending (but not rejected). You may run slqueue and check-cluster and downsize the requested resources.

External users (rest of the world)

GPU = 1
Node = 1
Max running job = 1

HAL session defaults

Default cluster job specs as follows:

GPU = 0
Node = 1
RAM and CPU resources of a node are shared between jobs executed on the node.