Hello Roland,

thank you very much for the fast answer: it seems as if both daemons are running:

ps aux | grep qlumand
root      1674  0.0  0.0  15432  3848 ?        Ss   10:33   0:00 /bin/bash /usr/sbin/qlumand -n
root     25368  0.0  0.0  14852  1148 pts/0    S+   13:50   0:00 grep --color=auto qlumand
 0  root@cl-login  ~  #
ps aux | grep qluman-router
root      1676  0.0  0.0  15432  3840 ?        Ss   10:33   0:00 /bin/bash /usr/sbin/qluman-router -n
root      1688  0.0  0.1 199748 30540 ?        Sl   10:33   0:00 python3 qluman-router.py -n
root     25393  0.0  0.0  14852  2736 pts/0    S+   13:50   0:00 grep --color=auto qluman-router

(10:33 is around the time when I restarted the machine after the /usr/sbin/qlustar-initial-config).
The 'qluman-router.log' seems fine; it has two infos of 'Starting Qluman Router" (from initial start and restart after initial config?)

and says

     - Listening to: tcp://*:6001
2019-12-04 10:33:49,584 [1688] INFO     Router.Router
     - Known servers:
       * Qlumand      (Public key 'R6CJ87mwl$K{q=FHC1FWAQic<)P05I})Q(oz6Kgt', flags=3)
       * Slurmd       (Public key 'PX41fhyQmGd)ha!FE=D5=zyIHh:?m=T}deA{xNp9', flags=1)

however, the 'qlumand.log' sais:

2019-12-04 10:33:51,132 [1689] INFO     __main__
     - Starting Qluman main server: qlumand.
2019-12-04 10:33:51,135 [1689] INFO     server.admin
     - Qlumand running with address beosrv-c / external cl-login
2019-12-04 10:33:51,854 [1689] INFO     server.db.DBData
     - DbVersion = 11.0.2.3 [expected 11.0.2.3]
2019-12-04 10:33:51,856 [1689] INFO     server.db.DBData          Adding column last_changed to table Hosts (default=2019-12-04 10:33:51.855800)
2019-12-04 10:33:51,867 [1689] ERROR    server.db.DBData          Probably already have column:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
    context)
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 470, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/lib/python3/dist-packages/mysql/connector/cursor.py", line 559, in execute
    self._handle_result(self._connection.cmd_query(stmt))
  File "/usr/lib/python3/dist-packages/mysql/connector/connection.py", line 494, in cmd_query
    result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
  File "/usr/lib/python3/dist-packages/mysql/connector/connection.py", line 396, in _handle_result
    raise errors.get_exception(packet)
mysql.connector.errors.ProgrammingError: 1060 (42S21): Duplicate column name 'last_changed'

The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/qluman-11/server/db/DBData.py", line 191, in add_column
    engine.execute("ALTER TABLE {0} ADD COLUMN {1} {2} NOT NULL DEFAULT '{3}'".format(table_name, column_name, column_type, default))
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 2064, in execute
    return connection.execute(statement, *multiparams, **params)
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 939, in execute
    return self._execute_text(object, multiparams, params)
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1097, in _execute_text
    statement, parameters
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1189, in _execute_context
    context)
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1402, in _handle_dbapi_exception
    exc_info
  File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 186, in reraise
    raise value.with_traceback(tb)
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
    context)
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 470, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/lib/python3/dist-packages/mysql/connector/cursor.py", line 559, in execute
    self._handle_result(self._connection.cmd_query(stmt))
  File "/usr/lib/python3/dist-packages/mysql/connector/connection.py", line 494, in cmd_query
    result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
  File "/usr/lib/python3/dist-packages/mysql/connector/connection.py", line 396, in _handle_result
    raise errors.get_exception(packet)
sqlalchemy.exc.ProgrammingError: (mysql.connector.errors.ProgrammingError) 1060 (42S21): Duplicate column name 'last_changed' [SQL: "ALTER TABLE Hosts ADD COLUMN last_changed DATETIME NOT NULL DEFAULT '2019-12-04 10:33:51.855800'"]
2019-12-04 10:33:51,945 [1689] INFO     server.db.DBData          Adding column status to table Hosts (default=0)
2019-12-04 10:33:51,949 [1689] ERROR    server.db.DBData          Probably already have column:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
    context)
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 470, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/lib/python3/dist-packages/mysql/connector/cursor.py", line 559, in execute
    self._handle_result(self._connection.cmd_query(stmt))
  File "/usr/lib/python3/dist-packages/mysql/connector/connection.py", line 494, in cmd_query
    result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
  File "/usr/lib/python3/dist-packages/mysql/connector/connection.py", line 396, in _handle_result
    raise errors.get_exception(packet)
mysql.connector.errors.ProgrammingError: 1060 (42S21): Duplicate column name 'status'

The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/qluman-11/server/db/DBData.py", line 191, in add_column
    engine.execute("ALTER TABLE {0} ADD COLUMN {1} {2} NOT NULL DEFAULT '{3}'".format(table_name, column_name, column_type, default))
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 2064, in execute
    return connection.execute(statement, *multiparams, **params)
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 939, in execute
    return self._execute_text(object, multiparams, params)
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1097, in _execute_text
    statement, parameters
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1189, in _execute_context^[OB^[OB^[OB^[OB^[OB^[OB
    context)
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1402, in _handle_dbapi_exception
    exc_info
  File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 186, in reraise
    raise value.with_traceback(tb)
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
    context)
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 470, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/lib/python3/dist-packages/mysql/connector/cursor.py", line 559, in execute
    self._handle_result(self._connection.cmd_query(stmt))
  File "/usr/lib/python3/dist-packages/mysql/connector/connection.py", line 494, in cmd_query
    result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
  File "/usr/lib/python3/dist-packages/mysql/connector/connection.py", line 396, in _handle_result
    raise errors.get_exception(packet)
sqlalchemy.exc.ProgrammingError: (mysql.connector.errors.ProgrammingError) 1060 (42S21): Duplicate column name 'status' [SQL: "ALTER TABLE Hosts ADD COLUMN status INTEGER UNSIGNED NOT NULL DEFAULT '0'"]
2019-12-04 10:33:52,440 [1689] INFO     server.db.DBData        default entries checked
2019-12-04 10:33:52,599 [1689] INFO     server.db.DBData        adding cli user
2019-12-04 10:33:52,689 [1689] ERROR    server.db.DBData        Critical: Can't determine IP of main head hostname 'beosrv-c'
  => Check your host info databases (NIS, /etc/hosts, etc.)
2019-12-04 10:33:52,700 [1689] ERROR    common.net      IP address of QLUSTAR_MAIN_HEADNODE is not defined in nameservice (NIS).
2019-12-04 10:33:52,701 [1689] ERROR    common.daemon   stopping with an exception^[OB^[OB^[OB^[OB^[OB^[OB
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/qluman-11/common/daemon.py", line 221, in start
    self.run()
  File "qlumand.py", line 36, in run
    Admin(self.config).main()
  File "/usr/lib/python3/dist-packages/qluman-11/server/admin.py", line 282, in __init__
    ql_mcastd_conf = self.cfg_gen.get_mcast_conf()
  File "/usr/lib/python3/dist-packages/qluman-11/server/cfgman/genconfs.py", line 649, in get_mcast_conf
    headnode = self.db_data.hosts.lookup(field="name", val=QLUSTAR_MAIN_HEADNODE)
  File "/usr/lib/python3/dist-packages/qluman-11/common/types.py", line 1866, in lookup
    raise KeyError
KeyError
2019-12-04 10:34:04,138 [2832] INFO     __main__
     - Starting Qluman main server: qlumand.
2019-12-04 10:34:04,141 [2832] INFO     server.admin
     - Qlumand running with address beosrv-c / external cl-login
2019-12-04 10:34:04,476 [2832] INFO     server.db.DBData
     - DbVersion = 11.0.2.8 [expected 11.0.2.3]
2019-12-04 10:34:04,899 [2832] INFO     server.db.DBData        default entries checked
2019-12-04 10:34:05,103 [2832] ERROR    server.db.DBData        Critical: Can't determine IP of main head hostname 'beosrv-c'
  => Check your host info databases (NIS, /etc/hosts, etc.)
2019-12-04 10:34:05,115 [2832] ERROR    common.net      IP address of QLUSTAR_MAIN_HEADNODE is not defined in nameservice (NIS).
2019-12-04 10:34:05,117 [2832] ERROR    common.daemon   stopping with an exception
Traceback (most recent call last):^[OB^[OB^[OB^[OB^[OB^[OB
  File "/usr/lib/python3/dist-packages/qluman-11/common/daemon.py", line 221, in start
    self.run()
  File "qlumand.py", line 36, in run
    Admin(self.config).main()
  File "/usr/lib/python3/dist-packages/qluman-11/server/admin.py", line 282, in __init__
    ql_mcastd_conf = self.cfg_gen.get_mcast_conf()
  File "/usr/lib/python3/dist-packages/qluman-11/server/cfgman/genconfs.py", line 649, in get_mcast_conf
    headnode = self.db_data.hosts.lookup(field="name", val=QLUSTAR_MAIN_HEADNODE)
  File "/usr/lib/python3/dist-packages/qluman-11/common/types.py", line 1866, in lookup
    raise KeyError
KeyError

with many repetitions of the part after '__main__'. What looks suspicious for me is the line ''DbVersion = 11.0.2.8 [expected 11.0.2.3]" but maybe more important is that 'IP address of QLUSTAR_MAIN_HEADNODE is not defined in nameservice (NIS)'? Might this be related to the fact that the hostname 'cl-login' is not what the computers name for the external dhcp-server?

many thanks in advance,
Tobias


On 04.12.19 13:02, Roland Fehrenbacher wrote:
"T" == Tobias Moehle <tobias.moehle@uni-rostock.de> writes:
Hi Tobias,

looks as if qlumand or qluman-router is not running. Please also check
the logfiles /var/log/qluman/{qlumand,qluman-router}.log for possible
errors.

Best,

Roland

    T> Dear all, I am trying to setup a new cluster using the current
    T> 11.0.0-3-image. I have tried already several times (also with
    T> updated image) and usually the setup works fine.  However, when
    T> trying to create the token, I keep getting the error

    T> qluman-cli --gencert
    T> ERROR:client.cli.network:client.cli.network.Cluster.__init__():
    T> could not connect to server ...
_______________________________________________
Qlustar-General mailing list -- qlustar-general@qlustar.org
To unsubscribe send an email to qlustar-general-leave@qlustar.org