Frequently Asked Questions
SSH Connections
Backup your working SSH key
- many SSH problems to connect to the ESPRI computing center can be solved by this solution : copy your SSH keys to another host or private cloud
I do ssh to the ESPRI computing center and I receive a "permission denied" message
- look here for sending debug info : https://documentations.ipsl.fr/spirit/ssh/about_ssh_key.html#permission-denied-issue
- If it's not your computer or your computer has been reinstalled, be sure to have restored your SSH keys on it. Look here: https://documentations.ipsl.fr/spirit/ssh/about_ssh_key.html#keys-replication
- if you change right on your home on the cluster ( write for group or everybody ), ssh keys cannot be used anymore before removing wrong right
error message "Unable to negotiate with xxx.xxx.xxx.xxx port 22: no matching host key type found.Their offer: ssh-rsa,ssh-dss"
- recent linux distributions, MacOS or mobaxterm with openssh version >= 8.7 have deactivated RSA alghorithm in their configuration : to know ssh version on your laptop/workstation/server type
ssh -V
- if your version is >= 8.7 , apply the following solution. create on your host a
$HOME/.ssh/config
file or modify it if it already exists with:
Host ciclad*
HostkeyAlgorithms +ssh-rsa
PubkeyAcceptedAlgorithms +ssh-rsa
Host camelot*
HostkeyAlgorithms +ssh-rsa
PubkeyAcceptedAlgorithms +ssh-rsa
Host loholt*
HostkeyAlgorithms +ssh-rsa
PubkeyAcceptedAlgorithms +ssh-rsa
Host forge*
HostKeyAlgorithms +ssh-rsa
PubkeyAcceptedAlgorithms +ssh-rsa
Host spirit*
PubkeyAcceptedAlgorithms +ssh-rsa
adding a public key to $HOME/.ssh/authorized_keys
file on the cluster is removed shortly
- It's not a problem, it's a feature on our clusters and there are many reasons for that :
- It automatically corrects any errors that you could do with your SSH authorized_keys
- It guarantees that you can log on the various ESPRI clusters with different homes (spirit/spiritx/ciclad/climserv/hal) with your SSH keys
- Security: any key added by someone else doesn't stay
I want to access to ESPRI computing center from another computer
error message "sign_and_send_pubkey: signing failed: agent refused operations"
- If you have copied your key on a new computer
- Solution: try
ssh-add
My ssh sessions are dying many times a day.
- Try:
ssh -o ServerAliveInterval=90s
and look here to make this change permanent: https://documentations.ipsl.fr/spirit/ssh/about_ssh_key.html#ssh-client-configuration
SSH graphical session not working or stopping working
- Verify your quota for your IPSL Mesocenter home
quotas
: It's not working anymore if over quota on /home- On Mac, have you installed XQuartz
- On linux and Mac, Try to start ssh with -X option
- On Windows use mobaxterm
- if nothing above resolve your problem, It's problem with your laptop/workstation and you have to see with your local support team
I'm in an ssh session on the ESPRI Computing center and after 15/20 minutes, remote graphics don't work anymore ( especially on MacOS )
try:
ssh -o ForwardX11Timeout=168h -X user@host
and look here to make this change permanent: https://documentations.ipsl.fr/MESO_User/SSH/About_SSH_key.html#using-config-file-on-linux-or-macos
scp to ipsl mesocenter is not working anymore
Do not put any echo command in your .bashrc, scp is not working after
Head node
- error message : -bash: fork: Resource temporarily unavailable
you have too many process on the concerned head node (maximum is 512 per user). only solution is to ask on support to kill all your process on this node. it's often seen with vscode server accumulated sessions
filesystem access and backup
-
on spirit I have removed by error files in /xxx
if it's on /home there is a mirror on /backupfs/home/ (started everyday at 5AM) and we have also an incremental backup of home files where we can retreive files deleted since to a maximum of six months
To retrieve from incremental backup you have to ask to support if it's other filesystem SORRY NO BACKUP -
on CMIP6 data I obtain HDF5 Error when reading some files
This problem seem to only occur on IPSL CMIP6 data (hosted at TGCC) This is not a problem with hdf5 library but with filesystem
it's could happen on one node and not on others
we don't know why this happen but we know how to correct just contact the support to say on which node it's happening
cmip6 hdf error on node xxxx could be a good subject ;-) -
I try to access on /xxxx and it's saying
permission denied
some dataset are protected and need to be in special group to access ask to support, we could verify if we can allow you to acces this dataset
-
I try to write file on /xxxx and it's saying
read-only filesystem
you are trying to write on a remote filesystem All remote filesystems are READ-ONLY sample: trying to write on /scratchx or /homedata on spirit cluster or trying to write on /scratchu or /data on spiritX cluster solution: work from the right cluster or write to the right filesystem to know more https://documentations.ipsl.fr/spirit/spirit_clusters/user_spaces.html
Jobs
- I submit a batch job and it's stay in queue ?
slurm job could be blocked by limit and in this case
slqueue -b
can help you to know why your job is blocked see also on https://documentations.ipsl.fr/spirit/spirit_clusters/slurm.html#slurm-user-limits
- Same jobs works sometime and sometime not
look on output file to see on which node they run when they works and which node when it's not
working ( could be a problem on one node , hardware , filesystem full or missing library
in job output first and last line give you the "Running Host: host name "
In case of submission of the problem to meso-support@ipsl.fr thanks to give us job number ,
place of script launched and also place of output of your jobs ( without this, we can't do something )