Hi,
Something went from when adding new nodes. The
configuration file was somehow not fully written. So I wrote
them again and restarted the slurm on all nodes. All nodes were
affected but the problem seems to be resolved now.
The Error read as follows:
K> 2018-10-10T09:14:43.306] error: _slurm_rpc_node_registration
K> node=beo-02: Invalid argument [2018-10-10T09:14:43.306] error:
K> Node beo-01 appears to have a different slurm.conf than the
K> slurmctld. This could cause issues with communication and
K> functionality. Please review both files and make sure they are
K> the same. If this is expected ignore, and set
K> DebugFlags=NO_CONF_HASH in your slurm.conf.
K> [2018-10-10T09:14:43.306] error: Node beo-01 has low
K> socket*core*thread count (16 < 32) [2018-10-10T09:14:43.306]
K> error: Node beo-01 has low cpu count (16 < 32)
K> [2018-10-10T09:14:43.306] error: _slurm_rpc_node_registration
K> node=beo-01: Invalid argument
Moreover it seems when a node is added, at the first boot time
slurm detect the RAM, etc wrongly. This can be resolved by simply
restarting slurm at the new added node. Perhaps the update would
indeed resolve this issue.
Best regards,
Kwinten