Hi Mike,
you can apply the following patch to your local installation. It will configure the DHCP lease time to 60 sec instead of 1h and should allow you to change HW frequently while reusing IPs. Please note that you will have to reapply the patch after each qluman update and restart qlumand (service qluman-server) afterwards. Since this use-case is highly exotic, we will refrain from adding a configure option for the lease-time in QluMan at this stage.
Best,
Roland
diff -u /usr/lib/python3/dist-packages/qluman-13/server/cfgman/genconfs.py.orig /usr/lib/python3/dist-packages/qluman-13/server/cfgman/genconfs.py --- /usr/lib/python3/dist-packages/qluman-13/server/cfgman/genconfs.py.orig 2024-07-30 19:20:36.171186609 +0200 +++ /usr/lib/python3/dist-packages/qluman-13/server/cfgman/genconfs.py 2024-07-30 19:20:45.281466382 +0200 @@ -1226,7 +1226,7 @@ boot_nets.add(net.id) if (net.cfg_by == CT.Net.CFG_BY.DHCP) and mac and ip: logger.debug(__(" config by DHCP")) - res += "dhcp-host={0},{1},{2},3600\n".format(mac, names[0], ip) + res += "dhcp-host={0},{1},{2},60\n".format(mac, names[0], ip) return res
# add hosts
On 7/19/24 17:30, hereiam--- via Qlustar General wrote:
Dear Qlustar Users,
I'm running what I think may be the world's worst supercomputer (for fun), composed entirely of different spec laptops. This was a pandemic project gone amok, just to see what is possible. I've used the opportunity to figure out how to do some strange things, one of which includes replacing nodes quickly and frequently on a Qlustar-based system, as laptops fail fairly often when used as HPCs.
The Problem: When deleting a host from Qluman (13.6.0, and many previous versions), and trying to assign a new physical machine to the exact same hostname (an attempt at drop-in replacement), it never acquired a DHCP license. Everything looked good from Qluman's side, but the new node would never get a DHCP license to start loading/booting Qlustar. It would, however, after restarting the head node. I'd rather not do that after every drop-in replacement.
The Solution: I found that each host has an entry in /var/lib/misc/dnsmasq.leases which blocks Qluman and such from truly assigning it as a new hosts. If instead I first delete that entry from dnsmasq.leases, restart the dnsmasq service by typing 'service dnsmasq restart,' and also rewrite the Slurm Config file (causing the slurmctl service to also restart), it works!
The Request: Could the Qlustar mods please consider a line in the "Delete Hosts" menu option in the Enclosure Vire of Qluman's scripts, which greps dnsmasq.leases for any line with that host name and deletes it, and restarts the dnsmasq service? I think that would do the trick, but I may be mistaken. Happy to discuss more if you'd all like!
Sincerely, -Mike _