Hi again,
thanks for reporting this. In the light of what you're seeing, setting oom_score_adj=-1000 for sshd indeed seems unnecessary in Qlustar 14 as socket activation will restart sshd if it got killed by OOM activity. We'll do some further testing on this and will remove the setting in the next security/bugfix release if no issues pop up. We'll also check whether it's not too complicated to implement the same behavior in Qlustar 13.
Best,
Roland
On 11/10/25 14:07, Rolandas Naujikas via Qlustar General wrote:
Hi,
Recently I found that our login nodes are abused with users running heavy nodejs sessions (from visual studio code). I setup some memory limits for users and OOM is coming, but cannot kill users processes, because all of them are with oom_score_adj=-1000. It is coming from parent sshd server also with oom_score_adj=-1000.
As a workaround I'm stopping traditional ssh server and starting ssh.socket at boot (via rc.boot script). Now oom_score_adj=0 as it should be. That is also important for computing nodes with ssh connection enabled.
Regards Rolandas
P.S. We are still on Qlustar 13, so I don't know situation on Qlustar 14, but it could be different because ssh connection in Ubuntu 24.04 LTS is activated via ssh socket already.