Dear all,
On our cluster, we have some nodes with infiniband which I am trying to set up at the moment.
After some time of struggeling, I finally understand: 1) The hardware is found:
#lspci | grep fini 04:00.0 InfiniBand: QLogic Corp. IBA7322 QDR InfiniBand HCA (rev 02) 2) The kernel-modules(?) seem to be loaded:
#lsmod | grep ib_ ib_ipoib 122880 0 ib_cm 53248 2 rdma_cm,ib_ipoib ib_uverbs 90112 1 rdma_ucm ib_umad 28672 0 ib_core 217088 7 rdma_cm,ib_ipoib,iw_cm,ib_umad,rdma_ucm,ib_uverbs,ib_cm ipv6 405504 59 rdma_cm,ib_ipoib,ib_core
#lsmod | grep verbs ib_uverbs 90112 1 rdma_ucm ib_core 217088 7 rdma_cm,ib_ipoib,iw_cm,ib_umad,rdma_ucm,ib_uverbs,ib_cm
3) but 'ibv_devices' gives an empty list, as well as ibstat etc. maybe helpful:
#ibnodes ibwarn: [3220] mad_rpc_open_port: can't open UMAD port ((null):0) src/ibnetdisc.c:784; can't open MAD port ((null):0) /usr/sbin/ibnetdiscover: iberror: failed: discover failed ibwarn: [3225] mad_rpc_open_port: can't open UMAD port ((null):0) src/ibnetdisc.c:784; can't open MAD port ((null):0) /usr/sbin/ibnetdiscover: iberror: failed: discover failed
From the qlustar-documentation ( https://qlustar.com/book/docs/qluman-guide#Configuring-IB) I know that I need to setup nodes that run OpenSM (Set Generic Property->OpenSM Ports->ALL); this I have, the service is running. Also, "the pre-defined hardware property IB Adapter with a value of true must be assigned to a host, to explicitly enable IB for it" But in my hardware properties no "IB Adapter" is available (only #CPU cores, #CPU sockets, HW Type, Size of RAM and Chassis Color). I expect this to be the issue; but what is wrong there?
Any help is highly appreciated.
Best regards,
Tobias Moehle
"T" == Tobias Moehle tobias.moehle@uni-rostock.de writes:
Hi Tobias,
please upgrade to the just released 10.1.1.4-b509f1240. Support for the IBA7322 was missing and has been readded.
T> Dear all, On our cluster, we have some nodes with infiniband T> which I am trying to set up at the moment.
T> After some time of struggeling, I finally understand: T> 1) The hardware is found:
T> #lspci | grep fini 04:00.0 InfiniBand: QLogic T> Corp. IBA7322 QDR InfiniBand HCA (rev 02) T> 2) The kernel-modules(?) seem to be loaded: ..................
T> Also, "the pre-defined hardware property IB Adapter with a value T> of true must be assigned to a host, to explicitly enable IB for T> it"
This is not necessary anymore. Needs to be fixed in the docs.
T> But in my hardware properties no "IB Adapter" is available (only T> #CPU cores, #CPU sockets, HW Type, Size of RAM and Chassis T> Color). I expect this to be the issue; but what is wrong there?
This HW property is obsolete. Just follow
https://docs.qlustar.com/en-US/Qlustar_Cluster_OS/10.1/html-single/QluMan_Gu...
to configure the IB network.
Best,
Roland