On 05.12.19 10:37, Roland Fehrenbacher wrote:
... T> so I guess that I ran in some 'inconsistent database'-state and T> maybe the easiest would be to start from scratch and ensure that T> the hostname resolution is working before doing all the T> setup/config steps? Thanks in advance, Tobias
yes, please do so. If you don't change anything manually, this must always be the case. Do you remember what manual steps you did last time, so that beosrv-c was not resolvable?
I think, I didn't do any 'manual steps' for not being resolvable; the server is known by another name from the external dhcp-server and '/etc/hosts' contained only 'cl-login'. Is that reasonable?
Best,
Roland _______________________________________________ Qlustar-General mailing list -- qlustar-general@qlustar.org To unsubscribe send an email to qlustar-general-leave@qlustar.org
I have now set up a new, clean installation but keep struggling with the same (or similar) problems. First of all, during the 'qlustar-initial-config' there was an error:
-- Registering VMs for demo cluster ...
Adding VM node beo-201: IP = 192.168.5.201, MAC = 02:00:99:99:99:c9 Traceback (most recent call last): File "/usr/lib/python3/dist-packages/qluman-11/common/types.py", line 183, in __init__ val = kwargs.pop(name) KeyError: 'last_changed'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/lib/python3/dist-packages/qluman-11/qluman-cli.py", line 1495, in <module> main() File "/usr/lib/python3/dist-packages/qluman-11/qluman-cli.py", line 1486, in main bootstrap(config, db_data, cfg_gen) File "/usr/lib/python3/dist-packages/qluman-11/qluman-cli.py", line 1132, in bootstrap hardware=hardware.protobuf().SerializeToString())])[0] File "/usr/lib/python3/dist-packages/qluman-11/common/types.py", line 585, in __init__ super().__init__(*args, hardware=hardware, **kwargs) File "/usr/lib/python3/dist-packages/qluman-11/common/types.py", line 287, in __init__ super().__init__(**kwargs) File "/usr/lib/python3/dist-packages/qluman-11/common/types.py", line 193, in __init__ raise ValueError("NetObject.__init__(): Missing value for {0}".format(name)) ValueError: NetObject.__init__(): Missing value for last_changed
and later it breaks with ''sacctmgr: Can't connect do slurmdbd'';
in the 'qlustar-initial-config.log' there is the statement:
Executing: /lib/systemd/systemd-sysv-install enable ntp setting 'passwd' in 'QluManDb' of '/etc/qlustar/qluman/db.cf' running 'qluman-cli --bootstrap' Status: 1 which might be the important part why the script failed? After this break, restarting the script proceeded well, but since all the actual setup of slurm failed, I fear I ended in a similar situation where things became inconsistent? The only manual step I did here was to introduce the correct nameserver in /etc/resolvconf/base in order to get connections to outside our subnet.
I hope that this information is useful?
Best regards,
Tobias