Job partitions
Understand the general information at this section.
HAL partitions
On HAL cluster, batch jobs are automatically assigned to the batch partition, whereas the interactive jobs are automatically assigned to the interactive partition.
Partition name | Default maximum execution time | execution time limit |
---|---|---|
batch | 1 hour | 72 hours |
interactive | 1 hour | 10 hours |
Tips
If the execution time of your Jupyter Notebook exceeds the execution time limit of the interactive partition, you can always run your notebook in batch, thank to the jupyter-nbconvert
command. More information about jupyter-nbconvert
at this page.
Job states
A job can be:
- submitted/accepted by the job manager. The job can be either running or pending.
- running: the submitted job is running on the requested resources of the cluster.
- pending: the submitted job is waiting for the resources of the cluster to run, or it is waiting for other jobs (pipelining).
- rejected: the specifications of the job exeeded the hardware and/or the session limits.
HAL hardware limits
Job are limited to the following hardware limits:
Node name | GPU |
---|---|
Hal1-6 | 2 |
Warning
If one of these limits are exceeded, the job submission is rejected.
Warning
A job whose specifications are below the hardware limits will run on the condition that a node has the requested resources. If the nodes that can meet the request, are busy, the job will be pending (but not rejected). You may run slqueue and check-cluster and downsize the requested resources.
HAL session limits
Session limits are about limits per given user and they are complementary with hardware limits.
Warning
If one of these limits (hardware and/or session) are exceeded, the job submission is rejected.
Internal users (members of IPSL)
Considering the jobs already submitted (cumulating the resources allocated to the accepted jobs).
- GPU = 2 (same node)
- Node = 1
- Max running job = 2 if the first job does not reach any above limits.
Warning
A job submission is rejected when it exeeds one of the above session limits. If one of the session limits is reached, the next job submission will be pending.
Warning
A job whose specifications are below the session limits will run on the condition that a node has the requested resources. If the nodes that can meet the request, are busy, the job will be pending (but not rejected). You may run slqueue
and check-cluster
and downsize the requested resources.
External users (rest of the world)
- GPU = 1
- Node = 1
- Max running job = 1
HAL session defaults
Default cluster job specs as follows:
- GPU = 0
- Node = 1
- RAM and CPU resources of a node are shared between jobs executed on the node.