Hello Roland,
thank you very much for the fast answer: it seems as if both daemons are running:
ps aux | grep qlumand
root 1674 0.0 0.0 15432 3848 ? Ss
10:33 0:00 /bin/bash /usr/sbin/qlumand -n
root 25368 0.0 0.0 14852 1148 pts/0 S+
13:50 0:00 grep --color=auto qlumand
0 root@cl-login ~ #
ps aux | grep qluman-router
root 1676 0.0 0.0 15432 3840 ? Ss
10:33 0:00 /bin/bash /usr/sbin/qluman-router -n
root 1688 0.0 0.1 199748 30540 ? Sl
10:33 0:00 python3 qluman-router.py -n
root 25393 0.0 0.0 14852 2736 pts/0 S+
13:50 0:00 grep --color=auto qluman-router
(10:33 is around the time when I restarted the machine after the
/usr/sbin/qlustar-initial-config).
The 'qluman-router.log' seems fine; it has two infos of 'Starting
Qluman Router" (from initial start and restart after initial
config?)
and says
- Listening to: tcp://*:6001
2019-12-04 10:33:49,584 [1688] INFO Router.Router
- Known servers:
* Qlumand (Public key
'R6CJ87mwl$K{q=FHC1FWAQic<)P05I})Q(oz6Kgt', flags=3)
* Slurmd (Public key
'PX41fhyQmGd)ha!FE=D5=zyIHh:?m=T}deA{xNp9', flags=1)
however, the 'qlumand.log' sais:
2019-12-04 10:33:51,132 [1689] INFO __main__
- Starting Qluman main server: qlumand.
2019-12-04 10:33:51,135 [1689] INFO
server.admin
- Qlumand running with address beosrv-c /
external cl-login
2019-12-04 10:33:51,854 [1689] INFO
server.db.DBData
- DbVersion = 11.0.2.3 [expected 11.0.2.3]
2019-12-04 10:33:51,856 [1689] INFO
server.db.DBData Adding column last_changed to table
Hosts (default=2019-12-04 10:33:51.855800)
2019-12-04 10:33:51,867 [1689] ERROR
server.db.DBData Probably already have column:
Traceback (most recent call last):
File
"/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py",
line 1182, in _execute_context
context)
File
"/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py",
line 470, in do_execute
cursor.execute(statement, parameters)
File
"/usr/lib/python3/dist-packages/mysql/connector/cursor.py",
line 559, in execute
self._handle_result(self._connection.cmd_query(stmt))
File
"/usr/lib/python3/dist-packages/mysql/connector/connection.py",
line 494, in cmd_query
result =
self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
File
"/usr/lib/python3/dist-packages/mysql/connector/connection.py",
line 396, in _handle_result
raise errors.get_exception(packet)
mysql.connector.errors.ProgrammingError: 1060
(42S21): Duplicate column name 'last_changed'
The above exception was the direct cause of the
following exception:
Traceback (most recent call last):
File
"/usr/lib/python3/dist-packages/qluman-11/server/db/DBData.py",
line 191, in add_column
engine.execute("ALTER TABLE {0} ADD COLUMN {1} {2} NOT
NULL DEFAULT '{3}'".format(table_name, column_name,
column_type, default))
File
"/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py",
line 2064, in execute
return connection.execute(statement, *multiparams,
**params)
File
"/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py",
line 939, in execute
return self._execute_text(object, multiparams, params)
File
"/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py",
line 1097, in _execute_text
statement, parameters
File
"/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py",
line 1189, in _execute_context
context)
File
"/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py",
line 1402, in _handle_dbapi_exception
exc_info
File
"/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py",
line 203, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb,
cause=cause)
File
"/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py",
line 186, in reraise
raise value.with_traceback(tb)
File
"/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py",
line 1182, in _execute_context
context)
File
"/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py",
line 470, in do_execute
cursor.execute(statement, parameters)
File
"/usr/lib/python3/dist-packages/mysql/connector/cursor.py",
line 559, in execute
self._handle_result(self._connection.cmd_query(stmt))
File
"/usr/lib/python3/dist-packages/mysql/connector/connection.py",
line 494, in cmd_query
result =
self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
File
"/usr/lib/python3/dist-packages/mysql/connector/connection.py",
line 396, in _handle_result
raise errors.get_exception(packet)
sqlalchemy.exc.ProgrammingError:
(mysql.connector.errors.ProgrammingError) 1060 (42S21):
Duplicate column name 'last_changed' [SQL: "ALTER TABLE Hosts
ADD COLUMN last_changed DATETIME NOT NULL DEFAULT '2019-12-04
10:33:51.855800'"]
2019-12-04 10:33:51,945 [1689] INFO
server.db.DBData Adding column status to table Hosts
(default=0)
2019-12-04 10:33:51,949 [1689] ERROR
server.db.DBData Probably already have column:
Traceback (most recent call last):
File
"/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py",
line 1182, in _execute_context
context)
File
"/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py",
line 470, in do_execute
cursor.execute(statement, parameters)
File
"/usr/lib/python3/dist-packages/mysql/connector/cursor.py",
line 559, in execute
self._handle_result(self._connection.cmd_query(stmt))
File
"/usr/lib/python3/dist-packages/mysql/connector/connection.py",
line 494, in cmd_query
result =
self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
File
"/usr/lib/python3/dist-packages/mysql/connector/connection.py",
line 396, in _handle_result
raise errors.get_exception(packet)
mysql.connector.errors.ProgrammingError: 1060 (42S21):
Duplicate column name 'status'
The above exception was the direct cause of the following
exception:
Traceback (most recent call last):
File
"/usr/lib/python3/dist-packages/qluman-11/server/db/DBData.py",
line 191, in add_column
engine.execute("ALTER TABLE {0} ADD COLUMN {1} {2} NOT
NULL DEFAULT '{3}'".format(table_name, column_name,
column_type, default))
File
"/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py",
line 2064, in execute
return connection.execute(statement, *multiparams,
**params)
File
"/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py",
line 939, in execute
return self._execute_text(object, multiparams, params)
File
"/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py",
line 1097, in _execute_text
statement, parameters
File
"/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py",
line 1189, in _execute_context^[OB^[OB^[OB^[OB^[OB^[OB
context)
File
"/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py",
line 1402, in _handle_dbapi_exception
exc_info
File
"/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py",
line 203, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb,
cause=cause)
File
"/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py",
line 186, in reraise
raise value.with_traceback(tb)
File
"/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py",
line 1182, in _execute_context
context)
File
"/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py",
line 470, in do_execute
cursor.execute(statement, parameters)
File
"/usr/lib/python3/dist-packages/mysql/connector/cursor.py",
line 559, in execute
self._handle_result(self._connection.cmd_query(stmt))
File
"/usr/lib/python3/dist-packages/mysql/connector/connection.py",
line 494, in cmd_query
result =
self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
File
"/usr/lib/python3/dist-packages/mysql/connector/connection.py",
line 396, in _handle_result
raise errors.get_exception(packet)
sqlalchemy.exc.ProgrammingError:
(mysql.connector.errors.ProgrammingError) 1060 (42S21):
Duplicate column name 'status' [SQL: "ALTER TABLE Hosts ADD
COLUMN status INTEGER UNSIGNED NOT NULL DEFAULT '0'"]
2019-12-04 10:33:52,440 [1689] INFO
server.db.DBData default entries checked
2019-12-04 10:33:52,599 [1689] INFO
server.db.DBData adding cli user
2019-12-04 10:33:52,689 [1689] ERROR
server.db.DBData Critical: Can't determine IP of main
head hostname 'beosrv-c'
=> Check your host info databases (NIS, /etc/hosts, etc.)
2019-12-04 10:33:52,700 [1689] ERROR common.net IP
address of QLUSTAR_MAIN_HEADNODE is not defined in nameservice
(NIS).
2019-12-04 10:33:52,701 [1689] ERROR common.daemon
stopping with an exception^[OB^[OB^[OB^[OB^[OB^[OB
Traceback (most recent call last):
File
"/usr/lib/python3/dist-packages/qluman-11/common/daemon.py",
line 221, in start
self.run()
File "qlumand.py", line 36, in run
Admin(self.config).main()
File
"/usr/lib/python3/dist-packages/qluman-11/server/admin.py",
line 282, in __init__
ql_mcastd_conf = self.cfg_gen.get_mcast_conf()
File
"/usr/lib/python3/dist-packages/qluman-11/server/cfgman/genconfs.py",
line 649, in get_mcast_conf
headnode = self.db_data.hosts.lookup(field="name",
val=QLUSTAR_MAIN_HEADNODE)
File
"/usr/lib/python3/dist-packages/qluman-11/common/types.py",
line 1866, in lookup
raise KeyError
KeyError
2019-12-04 10:34:04,138 [2832] INFO __main__
- Starting Qluman main server: qlumand.
2019-12-04 10:34:04,141 [2832] INFO server.admin
- Qlumand running with address beosrv-c / external
cl-login
2019-12-04 10:34:04,476 [2832] INFO server.db.DBData
- DbVersion = 11.0.2.8 [expected 11.0.2.3]
2019-12-04 10:34:04,899 [2832] INFO
server.db.DBData default entries checked
2019-12-04 10:34:05,103 [2832] ERROR
server.db.DBData Critical: Can't determine IP of main
head hostname 'beosrv-c'
=> Check your host info databases (NIS, /etc/hosts, etc.)
2019-12-04 10:34:05,115 [2832] ERROR common.net IP
address of QLUSTAR_MAIN_HEADNODE is not defined in nameservice
(NIS).
2019-12-04 10:34:05,117 [2832] ERROR common.daemon
stopping with an exception
Traceback (most recent call last):^[OB^[OB^[OB^[OB^[OB^[OB
File
"/usr/lib/python3/dist-packages/qluman-11/common/daemon.py",
line 221, in start
self.run()
File "qlumand.py", line 36, in run
Admin(self.config).main()
File
"/usr/lib/python3/dist-packages/qluman-11/server/admin.py",
line 282, in __init__
ql_mcastd_conf = self.cfg_gen.get_mcast_conf()
File
"/usr/lib/python3/dist-packages/qluman-11/server/cfgman/genconfs.py",
line 649, in get_mcast_conf
headnode = self.db_data.hosts.lookup(field="name",
val=QLUSTAR_MAIN_HEADNODE)
File
"/usr/lib/python3/dist-packages/qluman-11/common/types.py",
line 1866, in lookup
raise KeyError
KeyError
with many repetitions of the part after '__main__'. What looks suspicious for me is the line ''DbVersion = 11.0.2.8 [expected 11.0.2.3]" but maybe more important is that 'IP address of QLUSTAR_MAIN_HEADNODE is not defined in nameservice (NIS)'? Might this be related to the fact that the hostname 'cl-login' is not what the computers name for the external dhcp-server?
many thanks in advance,
Tobias
"T" == Tobias Moehle <tobias.moehle@uni-rostock.de> writes:Hi Tobias, looks as if qlumand or qluman-router is not running. Please also check the logfiles /var/log/qluman/{qlumand,qluman-router}.log for possible errors. Best, Roland T> Dear all, I am trying to setup a new cluster using the current T> 11.0.0-3-image. I have tried already several times (also with T> updated image) and usually the setup works fine. However, when T> trying to create the token, I keep getting the error T> qluman-cli --gencert T> ERROR:client.cli.network:client.cli.network.Cluster.__init__(): T> could not connect to server ... _______________________________________________ Qlustar-General mailing list -- qlustar-general@qlustar.org To unsubscribe send an email to qlustar-general-leave@qlustar.org