Hi,
Recently I found that our login nodes are abused with users running heavy nodejs sessions (from visual studio code). I setup some memory limits for users and OOM is coming, but cannot kill users processes, because all of them are with oom_score_adj=-1000. It is coming from parent sshd server also with oom_score_adj=-1000.
As a workaround I'm stopping traditional ssh server and starting ssh.socket at boot (via rc.boot script). Now oom_score_adj=0 as it should be. That is also important for computing nodes with ssh connection enabled.
Regards Rolandas
P.S. We are still on Qlustar 13, so I don't know situation on Qlustar 14, but it could be different because ssh connection in Ubuntu 24.04 LTS is activated via ssh socket already.
Hi again,
thanks for reporting this. In the light of what you're seeing, setting oom_score_adj=-1000 for sshd indeed seems unnecessary in Qlustar 14 as socket activation will restart sshd if it got killed by OOM activity. We'll do some further testing on this and will remove the setting in the next security/bugfix release if no issues pop up. We'll also check whether it's not too complicated to implement the same behavior in Qlustar 13.
Best,
Roland
On 11/10/25 14:07, Rolandas Naujikas via Qlustar General wrote:
Hi,
Recently I found that our login nodes are abused with users running heavy nodejs sessions (from visual studio code). I setup some memory limits for users and OOM is coming, but cannot kill users processes, because all of them are with oom_score_adj=-1000. It is coming from parent sshd server also with oom_score_adj=-1000.
As a workaround I'm stopping traditional ssh server and starting ssh.socket at boot (via rc.boot script). Now oom_score_adj=0 as it should be. That is also important for computing nodes with ssh connection enabled.
Regards Rolandas
P.S. We are still on Qlustar 13, so I don't know situation on Qlustar 14, but it could be different because ssh connection in Ubuntu 24.04 LTS is activated via ssh socket already.
On 11/11/25 22:46, Roland Fehrenbacher via Qlustar General wrote:
Hi again,
thanks for reporting this. In the light of what you're seeing, setting oom_score_adj=-1000 for sshd indeed seems unnecessary in Qlustar 14 as socket activation will restart sshd if it got killed by OOM activity.
No sshd server process exists, ssh socket is handled by init (systemd). Only users connections sshd processes exist. They config could be customized via ssh@.service overrides. I don't see any side effects, for e.g. fail2ban still works OK.
I implemented ssh socket activation on computing nodes also.
Regards Rolandas
We'll do some further testing on this and will remove the setting in the next security/bugfix release if no issues pop up. We'll also check whether it's not too complicated to implement the same behavior in Qlustar 13.
Best,
Roland
On 11/10/25 14:07, Rolandas Naujikas via Qlustar General wrote:
Hi,
Recently I found that our login nodes are abused with users running heavy nodejs sessions (from visual studio code). I setup some memory limits for users and OOM is coming, but cannot kill users processes, because all of them are with oom_score_adj=-1000. It is coming from parent sshd server also with oom_score_adj=-1000.
As a workaround I'm stopping traditional ssh server and starting ssh.socket at boot (via rc.boot script). Now oom_score_adj=0 as it should be. That is also important for computing nodes with ssh connection enabled.
Regards Rolandas
P.S. We are still on Qlustar 13, so I don't know situation on Qlustar 14, but it could be different because ssh connection in Ubuntu 24.04 LTS is activated via ssh socket already.
Qlustar General mailing list -- qlustar-general@qlustar.org To unsubscribe send an email to qlustar-general-leave@qlustar.org
Good to know. We need to experiment with this a bit. Thanks for the additional input.
Best,
Roland
On 11/12/25 07:19, Rolandas Naujikas via Qlustar General wrote:
On 11/11/25 22:46, Roland Fehrenbacher via Qlustar General wrote:
Hi again,
thanks for reporting this. In the light of what you're seeing, setting oom_score_adj=-1000 for sshd indeed seems unnecessary in Qlustar 14 as socket activation will restart sshd if it got killed by OOM activity.
No sshd server process exists, ssh socket is handled by init (systemd). Only users connections sshd processes exist. They config could be customized via ssh@.service overrides. I don't see any side effects, for e.g. fail2ban still works OK.
I implemented ssh socket activation on computing nodes also.
Regards Rolandas
We'll do some further testing on this and will remove the setting in the next security/bugfix release if no issues pop up. We'll also check whether it's not too complicated to implement the same behavior in Qlustar 13.
Best,
Roland
On 11/10/25 14:07, Rolandas Naujikas via Qlustar General wrote:
Hi,
Recently I found that our login nodes are abused with users running heavy nodejs sessions (from visual studio code). I setup some memory limits for users and OOM is coming, but cannot kill users processes, because all of them are with oom_score_adj=-1000. It is coming from parent sshd server also with oom_score_adj=-1000.
As a workaround I'm stopping traditional ssh server and starting ssh.socket at boot (via rc.boot script). Now oom_score_adj=0 as it should be. That is also important for computing nodes with ssh connection enabled.
Regards Rolandas
P.S. We are still on Qlustar 13, so I don't know situation on Qlustar 14, but it could be different because ssh connection in Ubuntu 24.04 LTS is activated via ssh socket already.
Qlustar General mailing list -- qlustar-general@qlustar.org To unsubscribe send an email to qlustar-general-leave@qlustar.org
Qlustar General mailing list -- qlustar-general@qlustar.org To unsubscribe send an email to qlustar-general-leave@qlustar.org