Frequently Asked Questions
ssh
-
Backup your working SSH key
- many SSH problems to connect to the IPSL mesocenter could be solved by this
- solution : copy on another host , private cloud
-
I do ssh to the IPSL Mesocenter and I have "permission denied"
- If it's first time you connect to the IPSL mesocenter look here https://documentations.ipsl.fr/MESO_User/SSH/About_SSH_key.html#ssh-connection-to-ipsl-mesocenter-has-never-work-what-to-do
- If it's not your computer or your computer has been reinstalled,
be sure to have restored your key on it
look here https://documentations.ipsl.fr/MESO_User/SSH/About_SSH_key.html#i-want-to-access-from-another-computer
-
I want to access to IPSL mesocenter from another computer
-
I have copied my key on a new computer and when connecting the following messages
"sign_and_send_pubkey: signing failed: agent refused operations" solution: try
ssh-add
-
My ssh sessions to IPSL Mesocenter are dying many time a day
try
ssh -o ServerAliveInterval=90s
and look here to make this change permanent https://documentations.ipsl.fr/MESO_User/SSH/About_SSH_key.html#using-config-file-on-linux-or-macos -
I'm in ssh on IPSL cluster from my host and after 15/20 minutes remote graphic don't work anymore ( specially on MacOSX )
try :
ssh -o ForwardX11Timeout=168h -X user@host
and look here to make this change permanent https://documentations.ipsl.fr/MESO_User/SSH/About_SSH_key.html#using-config-file-on-linux-or-macos -
When adding public key to my $HOME/.ssh/authorized_keys on cluster , seem to be removed shortly
- It's not a problem , it's a feature on our clusters and many reasons for that :
- Correct without intervention any errors that you can do on your authorized_keys
- be sure you could log on the 3 clusters with different home (ciclad/climserv/hal) with your key
- Security: added key by somebody else doesn't stay
-
I updated my linux distribution and ssh to ciclad or climserv doesn't work anymore
messages like :
** Unable to negotiate with xxx.xxx.xxx.xxx port 22: no matching host key type found. Their offer: ssh-rsa,ssh-dss **
All very new distribution with openssh 8.8 have deactivated rsa algorithm by default (fedora 33/34, latest manjaro,arch) to correct this problem on your host* : create on your host a $HOME/.ssh/config or modify if you already have one with
Host ciclad*
HostkeyAlgorithms +ssh-rsa
PubkeyAcceptedAlgorithms +ssh-rsa
Host camelot*
HostkeyAlgorithms +ssh-rsa
PubkeyAcceptedAlgorithms +ssh-rsa
Host loholt*
HostkeyAlgorithms +ssh-rsa
PubkeyAcceptedAlgorithms +ssh-rsa
python
- Installing anaconda on IPSL/Mesocenter(Ciclad/Climserv) and having GLIBC error
since September 2021 recent version of anaconda , miniconda are not anymore compatible with our systems There is no fix for the installer himself
The last Installers Versions known to work are:
- Anaconda3 https://repo.anaconda.com/archive/Anaconda3-2021.05-Linux-x86_64.sh
- Miniconda https://repo.anaconda.com/miniconda/Miniconda3-py39_4.9.2-Linux-x86_64.sh
for broken environment there is now a fix
data
-
on ciclad I have removed by error files in /xxx
if it's on /home there is a mirror on /backupfs/home/ (started everyday at 5AM) and we have also an incremental backup of home files where we can retreive files deleted since to a maximum of six months
To retrieve from incremental backup you have to ask to support if it's other filesystem SORRY NO BACKUP -
on CMIP6 data I obtain HDF5 Error when reading some files
This problem seem to only occur on IPSL CMIP6 data (hosted at TGCC) This is not a problem with hdf5 library but with filesystem
it's could happen on one node and not on others
we don't know why this happen but we know how to correct just contact the support to say on which node it's happening
cmip6 hdf error on node xxxx could be a good subject ;-) -
I try to access on /xxxx and it's saying permission denied
some dataset are protected and need to be in special group to access ask to support, we could verify if we can allow you to acces this dataset
Jobs
- I submit a batch job and it's stay in queue ?
job could be blocked by limit and in this case
showq -b
can help you to know why your job is blocked
or because there is not enough resources to run your job
showq -i
to see idle jobs
-
qdel doesn't want to stop my job
this could be seen when there is problem on node where your job was running do a
qstat -rn1 "Numjob"
to see on which node it was running thencheck-cluster
to see if the node status is down if this is the case mail to support qdel problem on node xxx -
My shell script work in command line , not with qsub
could be memory requirement see documentation default memory per job (mem) is 3G . Default virtual memory per job (vmem) is 4G. on head nodes mem is 8G vmem is 12g
in job output look on resources used sample: (Resources Used: cput=00:07:23,mem=5688kb,vmem=40568kb,walltime=00:08:21)
-
Same jobs works sometime and sometime not
look on output file to see on which node they run when they works and which node when it's not
working ( could be a problem on one node , hardware , filesystem full or missing library
in job output first and last line give you the "Running Host: host name "
In case of submission of the problem to meso-support@ipsl.fr thanks to give us job number ,
place of script launched and also place of output of your jobs ( without this, we can't do something ) -
PGI compiled code give error illegal intruction on some compute node
PGI compiler default is to detect processor on the node you're doing the compilation and we can't change this
so it's better to compile your code pgfortran -tp x64
all libraries compiled with pgi by us are compiled with -tp x64 ( openmpi , netcdf ...)