Dear all,
On our cluster, we have some nodes with infiniband which I am trying to set up at the moment.
After some time of struggeling, I finally understand: 1) The hardware is found:
#lspci | grep fini 04:00.0 InfiniBand: QLogic Corp. IBA7322 QDR InfiniBand HCA (rev 02) 2) The kernel-modules(?) seem to be loaded:
#lsmod | grep ib_ ib_ipoib 122880 0 ib_cm 53248 2 rdma_cm,ib_ipoib ib_uverbs 90112 1 rdma_ucm ib_umad 28672 0 ib_core 217088 7 rdma_cm,ib_ipoib,iw_cm,ib_umad,rdma_ucm,ib_uverbs,ib_cm ipv6 405504 59 rdma_cm,ib_ipoib,ib_core
#lsmod | grep verbs ib_uverbs 90112 1 rdma_ucm ib_core 217088 7 rdma_cm,ib_ipoib,iw_cm,ib_umad,rdma_ucm,ib_uverbs,ib_cm
3) but 'ibv_devices' gives an empty list, as well as ibstat etc. maybe helpful:
#ibnodes ibwarn: [3220] mad_rpc_open_port: can't open UMAD port ((null):0) src/ibnetdisc.c:784; can't open MAD port ((null):0) /usr/sbin/ibnetdiscover: iberror: failed: discover failed ibwarn: [3225] mad_rpc_open_port: can't open UMAD port ((null):0) src/ibnetdisc.c:784; can't open MAD port ((null):0) /usr/sbin/ibnetdiscover: iberror: failed: discover failed
From the qlustar-documentation ( https://qlustar.com/book/docs/qluman-guide#Configuring-IB) I know that I need to setup nodes that run OpenSM (Set Generic Property->OpenSM Ports->ALL); this I have, the service is running. Also, "the pre-defined hardware property IB Adapter with a value of true must be assigned to a host, to explicitly enable IB for it" But in my hardware properties no "IB Adapter" is available (only #CPU cores, #CPU sockets, HW Type, Size of RAM and Chassis Color). I expect this to be the issue; but what is wrong there?
Any help is highly appreciated.
Best regards,
Tobias Moehle