When I make the change, the preview of gres.conf looks good, but the logs look like this when I apply the write:
This is a single node (called gpu-11) that has 2 x titan XP cards that show up originally as TITAN-Xp in the gres groups, I then changed it to simply "titan"
[2019-04-05T08:54:15.636] error: Node gpu-11 appears to have a different slurm.conf than the slurmctld. This could cause issues with communication and functionality. Please review both files and make sure they are the same. If this is expected ignore, and set DebugFlags=NO_CONF_HASH in your slurm.conf. [2019-04-05T08:54:15.636] error: Setting node gpu-11 state to DRAIN [2019-04-05T08:54:15.637] drain_nodes: node gpu-11 state set to DRAIN [2019-04-05T08:54:15.637] error: _slurm_rpc_node_registration node=gpu-11: Invalid argument
The error about the differing slurm.conf shows up for all nodes whenever a config change occurs, but I'm assuming (maybe incorrectly) this is because the slurm.conf is on an NFS mount and the error can be ignored?
Thanks, Bryan
On Fri, Apr 5, 2019 at 7:44 AM Bryan Hill bhill@ucsd.edu wrote:
Hi Goswin:
When I do as you did, all of my GPU nodes show up as "Wrongly configured" and the nodes set themselves to "drained". I'll have a closer look at the log files.
On Fri, Apr 5, 2019 at 7:39 AM Goswin von Brederlow brederlo@q-leap.de wrote:
Hi Bryan,
I'm not sure I understand the problem or where the problem is. I created a GPU gres group on our test system set to have 2 geforce-rtx-2080-ti GPUs and assigned it to a host. I then edited the GPU type by double clikcing it and shortening it to 2080ti. When I then click preview the gres.conf reads as:
NodeName=beo-204 Name=gpu Type=2080ti Count=2
So there doesn't seem to be a problem in qluman-qt preventing the gres group to be simplified. After clicking "Write" the change should be written to the slurm config and slurm restarted to recognise it. That is all there should be to it.
If the generated gres.conf looks right on your end too then please check the log files of slurm itself to see if it complained about something on restart that would make it ignore the specified gres group. Please attach the generated slurm config and log file if you notice anything odd and include the full command you tried that failed and the exact error it produced.
Greetings, Goswin von Brederlow _______________________________________________ Qlustar-General mailing list -- qlustar-general@qlustar.org To unsubscribe send an email to qlustar-general-leave@qlustar.org