Frequently Asked Questions

ssh

Backup your working SSH key
- many SSH problems to connect to the IPSL mesocenter could be solved by this
- solution : copy on another host , private cloud
I do ssh to the IPSL Mesocenter and I have "permission denied"
- If it's first time you connect to the IPSL mesocenter look here https://documentations.ipsl.fr/MESO_User/SSH/About_SSH_key.html#ssh-connection-to-ipsl-mesocenter-has-never-work-what-to-do
- If it's not your computer or your computer has been reinstalled, be sure to have restored your key on it
  look here https://documentations.ipsl.fr/MESO_User/SSH/About_SSH_key.html#i-want-to-access-from-another-computer
I want to access to IPSL mesocenter from another computer

look here https://documentations.ipsl.fr/MESO_User/SSH/About_SSH_key.html#i-want-to-access-from-another-computer
I have copied my key on a new computer and when connecting the following messages

"sign_and_send_pubkey: signing failed: agent refused operations" solution: try ssh-add
My ssh sessions to IPSL Mesocenter are dying many time a day

try ssh -o ServerAliveInterval=90s and look here to make this change permanent https://documentations.ipsl.fr/MESO_User/SSH/About_SSH_key.html#using-config-file-on-linux-or-macos
I'm in ssh on IPSL cluster from my host and after 15/20 minutes remote graphic don't work anymore ( specially on MacOSX )

try : ssh -o ForwardX11Timeout=168h -X user@host and look here to make this change permanent https://documentations.ipsl.fr/MESO_User/SSH/About_SSH_key.html#using-config-file-on-linux-or-macos
When adding public key to my $HOME/.ssh/authorized_keys on cluster , seem to be removed shortly
- It's not a problem , it's a feature on our clusters and many reasons for that :
- Correct without intervention any errors that you can do on your authorized_keys
- be sure you could log on the 3 clusters with different home (ciclad/climserv/hal) with your key
- Security: added key by somebody else doesn't stay
I updated my linux distribution and ssh to ciclad or climserv doesn't work anymore

messages like : ** Unable to negotiate with xxx.xxx.xxx.xxx port 22: no matching host key type found. Their offer: ssh-rsa,ssh-dss **

All very new distribution with openssh 8.8 have deactivated rsa algorithm by default (fedora 33/34, latest manjaro,arch) to correct this problem on your host* : create on your host a $HOME/.ssh/config or modify if you already have one with

 Host ciclad*
     HostkeyAlgorithms +ssh-rsa
     PubkeyAcceptedAlgorithms +ssh-rsa
 Host camelot*
     HostkeyAlgorithms +ssh-rsa
     PubkeyAcceptedAlgorithms +ssh-rsa
 Host loholt*
     HostkeyAlgorithms +ssh-rsa
     PubkeyAcceptedAlgorithms +ssh-rsa

python

Installing anaconda on IPSL/Mesocenter(Ciclad/Climserv) and having GLIBC error

since September 2021 recent version of anaconda , miniconda are not anymore compatible with our systems There is no fix for the installer himself

The last Installers Versions known to work are:

Anaconda3 https://repo.anaconda.com/archive/Anaconda3-2021.05-Linux-x86_64.sh

Miniconda https://repo.anaconda.com/miniconda/Miniconda3-py39_4.9.2-Linux-x86_64.sh

for broken environment there is now a fix

look here https://documentations.ipsl.fr/MESO_User/Python/python_version.html#fixing-glibc-error

data

on ciclad I have removed by error files in /xxx

if it's on /home there is a mirror on /backupfs/home/ (started everyday at 5AM) and we have also an incremental backup of home files where we can retreive files deleted since to a maximum of six months
To retrieve from incremental backup you have to ask to support if it's other filesystem SORRY NO BACKUP
on CMIP6 data I obtain HDF5 Error when reading some files

This problem seem to only occur on IPSL CMIP6 data (hosted at TGCC) This is not a problem with hdf5 library but with filesystem
it's could happen on one node and not on others
we don't know why this happen but we know how to correct just contact the support to say on which node it's happening
cmip6 hdf error on node xxxx could be a good subject ;-)
I try to access on /xxxx and it's saying permission denied

some dataset are protected and need to be in special group to access ask to support, we could verify if we can allow you to acces this dataset

Jobs

I submit a batch job and it's stay in queue ?

job could be blocked by limit and in this case
showq -b can help you to know why your job is blocked
or because there is not enough resources to run your job
showq -i to see idle jobs

qdel doesn't want to stop my job

this could be seen when there is problem on node where your job was running do a qstat -rn1 "Numjob" to see on which node it was running then check-cluster to see if the node status is down if this is the case mail to support qdel problem on node xxx
My shell script work in command line , not with qsub

could be memory requirement see documentation default memory per job (mem) is 3G . Default virtual memory per job (vmem) is 4G. on head nodes mem is 8G vmem is 12g

in job output look on resources used sample: (Resources Used: cput=00:07:23,mem=5688kb,vmem=40568kb,walltime=00:08:21)
Same jobs works sometime and sometime not

look on output file to see on which node they run when they works and which node when it's not
working ( could be a problem on one node , hardware , filesystem full or missing library
in job output first and last line give you the "Running Host: host name "
In case of submission of the problem to meso-support@ipsl.fr thanks to give us job number ,
place of script launched and also place of output of your jobs ( without this, we can't do something )
PGI compiled code give error illegal intruction on some compute node

PGI compiler default is to detect processor on the node you're doing the compilation and we can't change this
so it's better to compile your code pgfortran -tp x64
all libraries compiled with pgi by us are compiled with -tp x64 ( openmpi , netcdf ...)