Hi,
In my cluster I have two exotic IBM Power servers with NVIDIA cards. I succeded install Ubuntu 18.04 and make slurm working qith qlustar 11 server, but for NVIDIA cards to manage I cannot find any way to add manually those resources to global slurm config (gres.conf is managed by qlustar and no way to add something manually).
Regards Rolandas
Hi Rolandas,
In my cluster I have two exotic IBM Power servers with NVIDIA cards. I succeded install Ubuntu 18.04 and make slurm working qith qlustar 11 server, but for NVIDIA cards to manage I cannot find any way to add manually those resources to global slurm config (gres.conf is managed by qlustar and no way to add something manually).
Is there any particular reason you wish to configure this manually? We have a variety of NVidia cards here, and GRES configuration via Qlustar works quite well.
A.
On 25/05/2021 11:18, Ansgar Esztermann-Kirchner wrote:
Hi Rolandas,
In my cluster I have two exotic IBM Power servers with NVIDIA cards. I succeded install Ubuntu 18.04 and make slurm working qith qlustar 11 server, but for NVIDIA cards to manage I cannot find any way to add manually those resources to global slurm config (gres.conf is managed by qlustar and no way to add something manually).
Is there any particular reason you wish to configure this manually? We have a variety of NVidia cards here, and GRES configuration via Qlustar works quite well.
On x86_64 nodes, managed by qlustar, they works also, but those IBM Power nodes are not supported by qlustar, because of different cpu architecture.
Regards Rolandas
A.
Qlustar-General mailing list -- qlustar-general@qlustar.org To unsubscribe send an email to qlustar-general-leave@qlustar.org
On Fri, May 28, 2021 at 08:14:43AM +0300, rolnas@gmail.com wrote:
On 25/05/2021 11:18, Ansgar Esztermann-Kirchner wrote:
Is there any particular reason you wish to configure this manually? We have a variety of NVidia cards here, and GRES configuration via Qlustar works quite well.
On x86_64 nodes, managed by qlustar, they works also, but those IBM Power nodes are not supported by qlustar, because of different cpu architecture.
I see. slurm.conf does have limited support for GRES, but as far as I know you cannot give any file names there. I am not sure if that would prevent GPU configuration entirely, or if it would just limit Slurm's GPU management (e.g. if Slurm would still be able to give GPUs to jobs, but not tie those to a concrete device).
This leads to a slightly different question (to the list in general): is it possible to exclude Slurm configuration from Qlustar management? We had to do that on an old cluster of ours (with a different management solution) because our SGE config was more involved than what the management software could handle. The vendor advised us to set a certain variable, and we'd manage SGE manually.
A.
"R" == rolnas rolnas@gmail.com writes:
Hi Rolandas
R> On 25/05/2021 11:18, Ansgar Esztermann-Kirchner wrote: >> Hi Rolandas, >> >>> In my cluster I have two exotic IBM Power servers with NVIDIA >>> cards. I succeded install Ubuntu 18.04 and make slurm working >>> qith qlustar 11 server, but for NVIDIA cards to manage I cannot >>> find any way to add manually those resources to global slurm >>> config (gres.conf is managed by qlustar and no way to add >>> something manually). >> >> Is there any particular reason you wish to configure this >> manually? We have a variety of NVidia cards here, and GRES >> configuration via Qlustar works quite well.
R> On x86_64 nodes, managed by qlustar, they works also, but those R> IBM Power nodes are not supported by qlustar, because of R> different cpu architecture.
QluMan doesn't care that your nodes are IBM Power machines and cannot boot qlustar images. Just register them as you would with x86 servers and do the slurm config for them. You cannot use the GPU wizard, but other than that everything will work. The slurm config is written out in flat files on /etc/qlustar/common (NFS), so if you mount that on your power nodes, you should have the correct config.
Concerning Ansgar's question: Slurm management can be disabled either totally by uninstalling slurm packages on the head-nodes or manually by not having any defined configs in QluMan (and never do write files for slurm). Currently, these are the only options.
Best,
Roland
Hi,
Finally I got it working with local on those nodes gres.conf file containing:
Name=gpu Type=tesla-v100-sxm2-32gb File=/dev/nvidia0 Name=gpu Type=tesla-v100-sxm2-32gb File=/dev/nvidia1 Name=gpu Type=tesla-v100-sxm2-32gb File=/dev/nvidia2 Name=gpu Type=tesla-v100-sxm2-32gb File=/dev/nvidia3
and adding Gres=gpu:4 parameter to specific Node Groups and resulting to slurm.conf entry is
NodeName=node-[40-41] CoresPerSocket=16 Gres=gpu:4 MemSpecLimit=1024 RealMemory=1036547 Sockets=2 ThreadsPerCore=4
On those nodes I'm automounting /etc/qlustar/common and reusing via symlinks most slurm configuration.
I can confirm with nvidia-smi (limited by slurm) access to nvidia cards.
Thanks Rolandas
On 28/05/2021 13:36, Roland Fehrenbacher wrote:
"R" == rolnas rolnas@gmail.com writes:
Hi Rolandas
R> On 25/05/2021 11:18, Ansgar Esztermann-Kirchner wrote: >> Hi Rolandas, >> >>> In my cluster I have two exotic IBM Power servers with NVIDIA >>> cards. I succeded install Ubuntu 18.04 and make slurm working >>> qith qlustar 11 server, but for NVIDIA cards to manage I cannot >>> find any way to add manually those resources to global slurm >>> config (gres.conf is managed by qlustar and no way to add >>> something manually). >> >> Is there any particular reason you wish to configure this >> manually? We have a variety of NVidia cards here, and GRES >> configuration via Qlustar works quite well. R> On x86_64 nodes, managed by qlustar, they works also, but those R> IBM Power nodes are not supported by qlustar, because of R> different cpu architecture.
QluMan doesn't care that your nodes are IBM Power machines and cannot boot qlustar images. Just register them as you would with x86 servers and do the slurm config for them. You cannot use the GPU wizard, but other than that everything will work. The slurm config is written out in flat files on /etc/qlustar/common (NFS), so if you mount that on your power nodes, you should have the correct config.
Concerning Ansgar's question: Slurm management can be disabled either totally by uninstalling slurm packages on the head-nodes or manually by not having any defined configs in QluMan (and never do write files for slurm). Currently, these are the only options.
Best,
Roland _______________________________________________ Qlustar-General mailing list -- qlustar-general@qlustar.org To unsubscribe send an email to qlustar-general-leave@qlustar.org