Hi all,
I've started with a new cluster using Qlustar 13.1 (fresh install). Hardware setup contains (currently) 3 HP DL380 machines, all hardware boxes have an infiniband interface and are connected to the same infiniband switch. 1 headnode (with openSM during installation), FE-login as VM, 2 compute nodes.
I noticed that the on the head node the infiniband interface is down. Upon investigating I discovered the following: In the qlustar management interface, the network config for the headnode shows ib0 in the commandline interface on the headnode using "ip address" I do not see any ib0 interface. I do see ibo49d1 (o not zero) "9: ibo49d1: <BROADCAST,MULTICAST> mtu 2044 qdisc fq_codel state DOWN group default qlen 256 link/infiniband 80:00:02:25:fe:80:00:00:00:00:00:00:04:09:73:ff:ff:e4:f4:22 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff altname ibp4s0d1"
in /etc/network/interfaces interface ib0 is mentioned
When I update the network config for the headnode in the qlustar management interface from ib0 to ibo49d1 and reboot after, the interface remains down without any ip address. I also noticed that for such a change in the qlustar management interface, I do not need to write files? in /etc/network/interface ib0 remains mentioned, while I expected that this would be updated to ibo49d1. Manually changing ib0 into ibo49d1 in /etc/network/interfaces did not work either.
The infiniband port is active: ibv_devinfo -d mlx4_0 hca_id: mlx4_0 transport: InfiniBand (0) fw_ver: 2.42.5700 node_guid: 0409:73ff:ffe4:f420 sys_image_guid: 0409:73ff:ffe4:f423 vendor_id: 0x02c9 vendor_part_id: 4103 hw_ver: 0x0 board_id: HP_1380110017 phys_port_cnt: 2 port: 1 state: PORT_DOWN (1) max_mtu: 4096 (5) active_mtu: 1024 (3) sm_lid: 0 port_lid: 0 port_lmc: 0x00 link_layer: Ethernet
port: 2 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 4096 (5) sm_lid: 1 port_lid: 1 port_lmc: 0x00 link_layer: InfiniBand
Using ibnetdiscover on the headnode, both compute nodes and the switch is listed.
Obviously I'm missing something. How do I get IPoIB up and running on the headnode?
The above is for the head node only. the compute nodes do have an interface ib0 and an ip adress for this interface.
Hi Nicolai,
due to systemd now also using persistent interface names for IB adapters, the installer often doesn't get the new name correctly. You should definitely use ibo49d1 in /etc/network/interfaces and additionally add a line like below to the ibo49d1 stanza there:
pre-up /sbin/modprobe ib_ipoib Hope this helps, Roland On 1/30/24 09:50, wo.nicolai@vuykrotterdam.com wrote:
Hi all,
I've started with a new cluster using Qlustar 13.1 (fresh install). Hardware setup contains (currently) 3 HP DL380 machines, all hardware boxes have an infiniband interface and are connected to the same infiniband switch. 1 headnode (with openSM during installation), FE-login as VM, 2 compute nodes.
I noticed that the on the head node the infiniband interface is down. Upon investigating I discovered the following: In the qlustar management interface, the network config for the headnode shows ib0 in the commandline interface on the headnode using "ip address" I do not see any ib0 interface. I do see ibo49d1 (o not zero) "9: ibo49d1: <BROADCAST,MULTICAST> mtu 2044 qdisc fq_codel state DOWN group default qlen 256 link/infiniband 80:00:02:25:fe:80:00:00:00:00:00:00:04:09:73:ff:ff:e4:f4:22 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff altname ibp4s0d1"
in /etc/network/interfaces interface ib0 is mentioned
When I update the network config for the headnode in the qlustar management interface from ib0 to ibo49d1 and reboot after, the interface remains down without any ip address. I also noticed that for such a change in the qlustar management interface, I do not need to write files? in /etc/network/interface ib0 remains mentioned, while I expected that this would be updated to ibo49d1. Manually changing ib0 into ibo49d1 in /etc/network/interfaces did not work either.
The infiniband port is active: ibv_devinfo -d mlx4_0 hca_id: mlx4_0 transport: InfiniBand (0) fw_ver: 2.42.5700 node_guid: 0409:73ff:ffe4:f420 sys_image_guid: 0409:73ff:ffe4:f423 vendor_id: 0x02c9 vendor_part_id: 4103 hw_ver: 0x0 board_id: HP_1380110017 phys_port_cnt: 2 port: 1 state: PORT_DOWN (1) max_mtu: 4096 (5) active_mtu: 1024 (3) sm_lid: 0 port_lid: 0 port_lmc: 0x00 link_layer: Ethernet
port: 2 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 4096 (5) sm_lid: 1 port_lid: 1 port_lmc: 0x00 link_layer: InfiniBand
Using ibnetdiscover on the headnode, both compute nodes and the switch is listed.
Obviously I'm missing something. How do I get IPoIB up and running on the headnode?
The above is for the head node only. the compute nodes do have an interface ib0 and an ip adress for this interface. _
Thanks Roland,
Changing the following to /etc/network/interfaces did the trick
Change “ auto ib0” to “auto ibo49d1” (top of file) Change “iface ib0 inet static” to “iface ibo49d1 inet static” And adding “pre-up /sbin/modprobe ib_ipoib” to iface ibo49d1
On 1/30/24 13:18, Nicolai, Wouter wrote:
Thanks Roland,
Changing the following to /etc/network/interfaces did the trick
Change “ auto ib0” to “auto ibo49d1” (top of file)
Change “iface ib0 inet static” to “iface ibo49d1 inet static”
And adding “pre-up /sbin/modprobe ib_ipoib” to iface ibo49d1
Very good, glad it worked. This will be solved more intelligently in the Qlustar 14 installer, so that it will work out of the box again.