Qlustar 10.1 installation failed on HP DL320 G6 server. The server has got 4 x 1TB disks, but the installer reported 'no unused disks'. I recall that the server has got the P410 RAID-controller on-board. I tried with different RAID configurations but with no luck.
Maybe this is 'too old' hardware for Qlustar ?
I am testing singularity and I encountered the problem that /var/lib is not synchronized with the chroot environment at the node.
I am looking now into the unionfs -fuse of the init.qlustar script but I can not find any problem here.
Is there a configuration file somewhere for the chroot environment?
Something went from when adding new nodes. The
configuration file was somehow not fully written. So I wrote
them again and restarted the slurm on all nodes. All nodes were
affected but the problem seems to be resolved now.
The Error read as follows:
K> 2018-10-10T09:14:43.306] error: _slurm_rpc_node_registration
K> node=beo-02: Invalid argument [2018-10-10T09:14:43.306] error:
K> Node beo-01 appears to have a different slurm.conf than the
K> slurmctld. This could cause issues with communication and
K> functionality. Please review both files and make sure they are
K> the same. If this is expected ignore, and set
K> DebugFlags=NO_CONF_HASH in your slurm.conf.
K> [2018-10-10T09:14:43.306] error: Node beo-01 has low
K> socket*core*thread count (16 < 32) [2018-10-10T09:14:43.306]
K> error: Node beo-01 has low cpu count (16 < 32)
K> [2018-10-10T09:14:43.306] error: _slurm_rpc_node_registration
K> node=beo-01: Invalid argument
Moreover it seems when a node is added, at the first boot time
slurm detect the RAM, etc wrongly. This can be resolved by simply
restarting slurm at the new added node. Perhaps the update would
indeed resolve this issue.