AI modules & packages
AI Modules
AI module base
The IPSL mesocenter offers a set of software to design and train Machine Learning models. These softwares are packaged as modules. Each AI module is based on an enhanced Anaconda distribution. An Anaconda distribution is a consistent set of binaries and Python packages. You can find the list of its elements here (choose the last version of Python for 64-bit Linux).
But we also add some useful Python packages (e.g. opencv, hyperops, cartopy, etc.) to this Anaconda distribution so as to produce an enhanced base. So when you load an AI module, you automatically get a large set of Python packages which some are categorized below.
Info
Assuming that a Python module is loaded, you can get the complete list of the packages with this command line: pip list
Info
If you are missing one or more Python packages, you can suggest the integration of these packages by following the procedure described here. This procedure requires an account on the IN2P3 Gitlab and to create a Gitlab issue. The IA modules are updated once a year, during the months of January-February. In the meantime, you can also extend these modules by following this procedure.
AI module list
The AI modules can be categorized following the ML method families.
Decision trees Modules (boost methods)
- Catboost
- Lightgbm (without GPU support)
- XGBoost
Deep Learning modules
- Jax
- Pytorch
- Pytorch Lightning which is embbeded in the Pytorch module
- Pytorch Ignite which is embbeded in the Pytorch module
- Tensorflow
Pytorch and Tensorflow come with specific packages (e.g., captum for Pytorch ; keras-tuner for Tensorflow).
Modules compatibility
Not all versions of AI modules are compatible with all GPU architectures, especially the latest ones. We mean compatible not by the loss of GPU acceleration, but by the inability to run code written with these modules on the GPUs concerned! "OK" means the module is compatible with the GPU, "KO" means it is not compatible.
Info
The compatibility concerns only the modules running on GPU. Indeed, the modules executed on CPU work all without exception.
Tips
Slurm, the HAL cluster's job manager, offers you an option to choose the GPU architecture where your code will be executed. The --gpus=<ampere or turing>:<1 or 2>
option for the srun
command and the #SBATCH --gpus=<ampere or turing>:<1 or 2>
instruction in a bootstrap batch script for the sbatch
command. e.g. --gpus=turing:1
and #SBATCH --gpus=turing:1
so as to allocate one Nvidia® GeForce® RTX 2080 Ti GPU cards. Run squeue
and sinfo
so as to get the availability of the cluster nodes. Note that if the GPU architecture is not specified, Slurm chooses randomly between Turing and Ampere.
Tips
The AI modules of the Jean Zay supercomputer also have compatibility problems. They are of the same kind: its V100 cards are of volta architecture, its A100 cards are of ampere architecture. In the table, replace the RTX 2080 Ti card by V100 and the RTX A5000 card by A100 and you will obtain the compatibility of the modules on Jean Zay.
Name | Version | Year | RTX 2080 Ti (hal1-4) - Turing | RTX A5000 (hal5-6) - Ampere |
---|---|---|---|---|
catboost | 0.24.4 | 2021 | OK | OK |
catboost | 1.0.4 | 2022 | OK | OK |
catboost | 1.1.1 | 2023 | OK | OK |
catboost | 1.2.2 | 2024 | OK | OK |
jax | 0.4.1 | 2023 | OK | OK |
jax | 0.4.23 | 2024 | OK | OK |
pytorch | 1.7.1 | 2021 | OK | KO |
pytorch | 1.8.2-lts | 2022 | OK | OK |
pytorch | 1.10.1 | 2022 | OK | OK |
pytorch | 1.13.1 | 2023 | OK | OK |
pytorch | 2.1.2 | 2024 | OK | OK |
pytorch-ignite | 0.4.3 | 2021 | OK | KO |
pytorch-ignite | 0.4.8 | 2022 | OK | OK |
pytorch-ignite | 0.4.10 | 2023 | OK | OK |
pytorch-ignite | 0.4.13 | 2024 | OK | OK |
pytorch-lightning | 1.1.8-gpu | 2021 | OK | KO |
pytorch-lightning | 1.5.10 | 2022 | OK | OK |
pytorch-lightning | 1.8.6 | 2023 | OK | OK |
pytorch-lightning | 2.1.3 | 2024 | OK | OK |
tensorflow | 2.2.0 | 2021 | OK | KO |
tensorflow | 2.4.1 | 2022 | OK | OK |
tensorflow | 2.6.3 | 2022 | OK | OK |
tensorflow | 2.7.0 | 2022 | OK | OK |
tensorflow | 2.9.1 | 2023 | OK | OK |
tensorflow | 2.11.0 | 2023 | OK | OK |
tensorflow | 2.15.0 | 2024 | OK | OK |
xgboost | 1.5.2 | 2022 | OK | OK |
xgboost | 1.7.1 | 2023 | OK | OK |
xgboost | 2.0.3 | 2024 | OK | OK |
Python AI package classification
The following sections is a attempt to give you hints about Python packages useful in Machine Learning. The packages are divided into convenient categories. The list is not exhaustive.
Development
- bandit
- black
- conda-lock
- conda-pack
- dask-jobqueue
- filprofiler
- flake8
- gitpython
- glances
- isort
- memory_profiler
- nvidia-ml-py
- pre-commit-hooks
- pyaml
- pycodestyle
- pympler
- pynvml
- radon
- ruff
- scalene
Data engineering packages
- cfgrib
- dask
- netcdf4
- pandas
- polars
- xarray
- zarr
- zfp
Data handling packages
- imbalanced-learn
Data Visualization packages
- graphviz + python-graphviz
- pydot
Decision trees packages
- catboost
- lightgbm
- xgboost
Deep learning frameworks
- jax
- pytorch
- tensorflow
- deepxde
- transformers
eXplainable AI packages
- shap
GPU distributed computation packages
- horovod
- mpi4py
- nccl
Image processing packages
- mahotas
- opencv + opencv-python
Jax specific packages
- dm-haiku
- equinox
Machine Learning experiment tracking
- clearml
- comet_ml
- mlflow
- neptune-client neptune-optuna neptune-sklearn
- wandb
Optimization packages
- hyperopt
- keras-tuner
- optuna
- ray-tune
- bayesian-optimization
Pytorch specific packages
Abstraction layers
- pytorch-lightning
- ignite
Data engineering
- webdataset
eXplainable AI packages
- captum
Model handling packages
- segmentation-models-pytorch
- torchinfo
- torchmetrics
- kornia
Profiling packages
- torch-tb-profiler (tensorboard plugin)
Signal processing packages
- pytorchvideo
- torchaudio
- torchvision
Other packages
- gpytorch
Tensorflow specific packages
Tensorflow extension packages
- tensorflow-addons
- tensorflow-datasets
- tensorflow-io
- tensorflow-io-gcs-filesystem
- tensorflow-metadata
Tensorflow dataset packages
- tensorflow-datasets
Train monitoring & experiment handling packages
- tensorboard
- mlflow
- neptune
- wandb