Hello Roland,
thank you very much for the fast answer: it seems as if both daemons are running:
ps aux | grep qlumand root 1674 0.0 0.0 15432 3848 ? Ss 10:33 0:00 /bin/bash /usr/sbin/qlumand -n root 25368 0.0 0.0 14852 1148 pts/0 S+ 13:50 0:00 grep --color=auto qlumand 0 root@cl-login ~ # ps aux | grep qluman-router root 1676 0.0 0.0 15432 3840 ? Ss 10:33 0:00 /bin/bash /usr/sbin/qluman-router -n root 1688 0.0 0.1 199748 30540 ? Sl 10:33 0:00 python3 qluman-router.py -n root 25393 0.0 0.0 14852 2736 pts/0 S+ 13:50 0:00 grep --color=auto qluman-router
(10:33 is around the time when I restarted the machine after the /usr/sbin/qlustar-initial-config). The 'qluman-router.log' seems fine; it has two infos of 'Starting Qluman Router" (from initial start and restart after initial config?)
and says
- Listening to: tcp://*:6001 2019-12-04 10:33:49,584 [1688] INFO Router.Router - Known servers: * Qlumand (Public key 'R6CJ87mwl$K{q=FHC1FWAQic<)P05I})Q(oz6Kgt', flags=3) * Slurmd (Public key 'PX41fhyQmGd)ha!FE=D5=zyIHh:?m=T}deA{xNp9', flags=1)
however, the 'qlumand.log' sais:
/2019-12-04 10:33:51,132 [1689] INFO __main__// // - Starting Qluman main server: qlumand.// //2019-12-04 10:33:51,135 [1689] INFO server.admin// // - Qlumand running with address beosrv-c / external cl-login// //2019-12-04 10:33:51,854 [1689] INFO server.db.DBData// // - DbVersion = 11.0.2.3 [expected 11.0.2.3]// //2019-12-04 10:33:51,856 [1689] INFO server.db.DBData Adding column last_changed to table Hosts (default=2019-12-04 10:33:51.855800)// //2019-12-04 10:33:51,867 [1689] ERROR server.db.DBData Probably already have column:// //Traceback (most recent call last):// // File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context// // context)// // File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 470, in do_execute// // cursor.execute(statement, parameters)// // File "/usr/lib/python3/dist-packages/mysql/connector/cursor.py", line 559, in execute// //self._handle_result(self._connection.cmd_query(stmt))// // File "/usr/lib/python3/dist-packages/mysql/connector/connection.py", line 494, in cmd_query// // result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query))// // File "/usr/lib/python3/dist-packages/mysql/connector/connection.py", line 396, in _handle_result// // raise errors.get_exception(packet)// //mysql.connector.errors.ProgrammingError: 1060 (42S21): Duplicate column name 'last_changed'// // //The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/qluman-11/server/db/DBData.py", line 191, in add_column engine.execute("ALTER TABLE {0} ADD COLUMN {1} {2} NOT NULL DEFAULT '{3}'".format(table_name, column_name, column_type, default)) File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 2064, in execute return connection.execute(statement, *multiparams, **params) File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 939, in execute return self._execute_text(object, multiparams, params) File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1097, in _execute_text statement, parameters File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1189, in _execute_context context) File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1402, in _handle_dbapi_exception exc_info File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause reraise(type(exception), exception, tb=exc_tb, cause=cause) File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 186, in reraise raise value.with_traceback(tb) File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context context) File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 470, in do_execute cursor.execute(statement, parameters) File "/usr/lib/python3/dist-packages/mysql/connector/cursor.py", line 559, in execute self._handle_result(self._connection.cmd_query(stmt)) File "/usr/lib/python3/dist-packages/mysql/connector/connection.py", line 494, in cmd_query result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query)) File "/usr/lib/python3/dist-packages/mysql/connector/connection.py", line 396, in _handle_result raise errors.get_exception(packet) sqlalchemy.exc.ProgrammingError: (mysql.connector.errors.ProgrammingError) 1060 (42S21): Duplicate column name 'last_changed' [SQL: "ALTER TABLE Hosts ADD COLUMN last_changed DATETIME NOT NULL DEFAULT '2019-12-04 10:33:51.855800'"] 2019-12-04 10:33:51,945 [1689] INFO server.db.DBData Adding column status to table Hosts (default=0) 2019-12-04 10:33:51,949 [1689] ERROR server.db.DBData Probably already have column: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context context) File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 470, in do_execute cursor.execute(statement, parameters) File "/usr/lib/python3/dist-packages/mysql/connector/cursor.py", line 559, in execute self._handle_result(self._connection.cmd_query(stmt)) File "/usr/lib/python3/dist-packages/mysql/connector/connection.py", line 494, in cmd_query result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query)) File "/usr/lib/python3/dist-packages/mysql/connector/connection.py", line 396, in _handle_result raise errors.get_exception(packet) mysql.connector.errors.ProgrammingError: 1060 (42S21): Duplicate column name 'status'
The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/qluman-11/server/db/DBData.py", line 191, in add_column engine.execute("ALTER TABLE {0} ADD COLUMN {1} {2} NOT NULL DEFAULT '{3}'".format(table_name, column_name, column_type, default)) File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 2064, in execute return connection.execute(statement, *multiparams, **params) File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 939, in execute return self._execute_text(object, multiparams, params) File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1097, in _execute_text statement, parameters File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1189, in _execute_context^[OB^[OB^[OB^[OB^[OB^[OB context) File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1402, in _handle_dbapi_exception exc_info File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause reraise(type(exception), exception, tb=exc_tb, cause=cause) File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 186, in reraise raise value.with_traceback(tb) File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context context) File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 470, in do_execute cursor.execute(statement, parameters) File "/usr/lib/python3/dist-packages/mysql/connector/cursor.py", line 559, in execute self._handle_result(self._connection.cmd_query(stmt)) File "/usr/lib/python3/dist-packages/mysql/connector/connection.py", line 494, in cmd_query result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query)) File "/usr/lib/python3/dist-packages/mysql/connector/connection.py", line 396, in _handle_result raise errors.get_exception(packet) sqlalchemy.exc.ProgrammingError: (mysql.connector.errors.ProgrammingError) 1060 (42S21): Duplicate column name 'status' [SQL: "ALTER TABLE Hosts ADD COLUMN status INTEGER UNSIGNED NOT NULL DEFAULT '0'"] 2019-12-04 10:33:52,440 [1689] INFO server.db.DBData default entries checked 2019-12-04 10:33:52,599 [1689] INFO server.db.DBData adding cli user 2019-12-04 10:33:52,689 [1689] ERROR server.db.DBData Critical: Can't determine IP of main head hostname 'beosrv-c' => Check your host info databases (NIS, /etc/hosts, etc.) 2019-12-04 10:33:52,700 [1689] ERROR common.net IP address of QLUSTAR_MAIN_HEADNODE is not defined in nameservice (NIS). 2019-12-04 10:33:52,701 [1689] ERROR common.daemon stopping with an exception^[OB^[OB^[OB^[OB^[OB^[OB Traceback (most recent call last): File "/usr/lib/python3/dist-packages/qluman-11/common/daemon.py", line 221, in start self.run() File "qlumand.py", line 36, in run Admin(self.config).main() File "/usr/lib/python3/dist-packages/qluman-11/server/admin.py", line 282, in __init__ ql_mcastd_conf = self.cfg_gen.get_mcast_conf() File "/usr/lib/python3/dist-packages/qluman-11/server/cfgman/genconfs.py", line 649, in get_mcast_conf headnode = self.db_data.hosts.lookup(field="name", val=QLUSTAR_MAIN_HEADNODE) File "/usr/lib/python3/dist-packages/qluman-11/common/types.py", line 1866, in lookup raise KeyError KeyError 2019-12-04 10:34:04,138 [2832] INFO __main__ - Starting Qluman main server: qlumand. 2019-12-04 10:34:04,141 [2832] INFO server.admin - Qlumand running with address beosrv-c / external cl-login 2019-12-04 10:34:04,476 [2832] INFO server.db.DBData - DbVersion = 11.0.2.8 [expected 11.0.2.3] 2019-12-04 10:34:04,899 [2832] INFO server.db.DBData default entries checked 2019-12-04 10:34:05,103 [2832] ERROR server.db.DBData Critical: Can't determine IP of main head hostname 'beosrv-c' => Check your host info databases (NIS, /etc/hosts, etc.) 2019-12-04 10:34:05,115 [2832] ERROR common.net IP address of QLUSTAR_MAIN_HEADNODE is not defined in nameservice (NIS). 2019-12-04 10:34:05,117 [2832] ERROR common.daemon stopping with an exception Traceback (most recent call last):^[OB^[OB^[OB^[OB^[OB^[OB File "/usr/lib/python3/dist-packages/qluman-11/common/daemon.py", line 221, in start self.run() File "qlumand.py", line 36, in run Admin(self.config).main() File "/usr/lib/python3/dist-packages/qluman-11/server/admin.py", line 282, in __init__ ql_mcastd_conf = self.cfg_gen.get_mcast_conf() File "/usr/lib/python3/dist-packages/qluman-11/server/cfgman/genconfs.py", line 649, in get_mcast_conf headnode = self.db_data.hosts.lookup(field="name", val=QLUSTAR_MAIN_HEADNODE) File "/usr/lib/python3/dist-packages/qluman-11/common/types.py", line 1866, in lookup raise KeyError KeyError
/
with many repetitions of the part after '__main__'. What looks suspicious for me is the line ''DbVersion = 11.0.2.8 [expected 11.0.2.3]"but maybe more important is that'IP address of QLUSTAR_MAIN_HEADNODE is not defined in nameservice (NIS)'? Might this be related to the fact that the hostname 'cl-login' is not what the computers name for the external dhcp-server?
many thanks in advance, Tobias
On 04.12.19 13:02, Roland Fehrenbacher wrote:
"T" == Tobias Moehle tobias.moehle@uni-rostock.de writes:
Hi Tobias,
looks as if qlumand or qluman-router is not running. Please also check the logfiles /var/log/qluman/{qlumand,qluman-router}.log for possible errors.
Best,
Roland
T> Dear all, I am trying to setup a new cluster using the current T> 11.0.0-3-image. I have tried already several times (also with T> updated image) and usually the setup works fine. However, when T> trying to create the token, I keep getting the error T> qluman-cli --gencert T> ERROR:client.cli.network:client.cli.network.Cluster.__init__(): T> could not connect to server ...
Qlustar-General mailing list -- qlustar-general@qlustar.org To unsubscribe send an email to qlustar-general-leave@qlustar.org