HAL job manager
Slurm
Once connected to the hal0-ng.obsq.uvsq.fr, the head node, you can submit interactive or batch jobs to access the computational resources. General information about Slurm are given here.
Warning
If you are using development IDEs like Spyder, it is necessary to close the application in order to really free the memory of the GPU card.
Info
During the development phase of your code, it is not necessary to use your entire dataset or to use a GPU card. So you can omit the GPU card reservation in your job submissions (remove the option --gpus
), which avoids unnecessary immobilization of precious resources that others would like to use. You also have the possibility to develop your code on the other CPU clusters of the computing centre of the IPSL as the AI modules are also available on these clusters. Finally, you can sub-sample your data beforehand.
GPU Monitoring
List of GPU monitoring commands. Executable only on hal[1-6] machines (not on hal0):
- nvtop: the GPU version of the top and htop commands. A well made ASCII art command so as to monitor GPU cards in real time (computation activity, memory, data transfer, etc.).
- nvidia-smi: text mode command for monitoring GPU cards, that can be part of an automated process.
Tips
Monitoring the activity of the GPU cards will help you a lot to optimize your computations.