QluMan containerization
by Hernán Morales Durand
Hello,
I have a node where I've installed QluMan 10.1.1 so I can configure
the Cluster using the GUI. Now this node is NOT added to the cluster
(i.e. not currently booting any OS image) so it's somewhat "wasted".
As far as I understood the OS images are executed in the nodes in RAM
memory. So theoretically I could boot this node with an OS image and
later mount the fs to try qluman-10.1.1.19-singularity.sqsh however I
don't see how that could work because of lib compatibility. Is that
correct/feasible?
And I wonder if there is a container image already built specifically
to run the qluman.sqsh application. I couldn't find none so far and I
wanted to ask before starting to explore this path.
What do you recommend?
Cheers,
Hernán
1 year, 9 months
HP Proliant DL360 G5 issues
by bscook999@gmail.com
Hello. I'm having a lot of trouble getting the Qlustar 10.0 installer to recognize the the hard drives. During the installation process an error message comes up that there are no available hard drives and the FAI stops. HP Proliant DL360 G5 Smart Array P400i is working. I did a standard Ubuntu 18.04.2 Server LTS installation and the hard drives were recognized just fine. I'm not sure how to get the Qlustar to recognize it. Is it possible to install Qlustar from within ubuntu? I would prefer the automated installation process if possible to avoid errors, but if I can't get it to work. Please advise, thanks.
1 year, 10 months
Nodes won't boot
by Ansgar Esztermann-Kirchner
Hello List,
recently, I've moved /apps to its own device. This invalidates NFS
handles on the nodes, of course, so I started to reboot them.
To my surprise, they don't come up again. The nodes complain about a
time-out, "Failed to request QluMan node config in time", and ask me
to check qlumand and qluman-route on the head.
These two processes are indeed running. I've checked the logs, but
couldn't fine anything helpful (to me) in there.
qlumand seems to see the node briefly:
2019-03-05 11:45:06,615 [29219] INFO server.admin
- Identifying node from '00-25-90-d9-08-86'
2019-03-05 11:45:06,617 [29219] INFO server.admin
- Registering Execd 'node31-35'
2019-03-05 11:45:39,645 [29219] INFO server.admin
- Execd '00-25-90-d9-08-86' disconnected
I've attached the router log.
There's one thing that's conspicuous, but it doesn't seem to be
correlated with the nodes booting: a stack trace when accessing the
database.
I'll be grateful for any pointers.
Thanks,
A.
--
Ansgar Esztermann
Sysadmin Dep. Theoretical and Computational Biophysics
http://www.mpibpc.mpg.de/grubmueller/esztermann
1 year, 10 months
NVidia driver
by Bryan Hill
Hello All!
Is there a way to get a newer nvidia driver working with the current
Qlustar 10.1.1? The nvidia module in the Qlustar repo is 390, and we
just received some nodes with RTX2080 Ti cards, which require >= 410
I've tried installing inside of a chroot, but it doesn't want to work.
Thanks,
Bryan
1 year, 10 months
IPMI Power
by Ansgar Esztermann-Kirchner
Hello List,
what is the preferred way to power control nodes? IPMI setup is done
by qluman, but there does not seem to be a way to control nodes from
the GUI. I've found ql-remote-control, but it complains about a
missing config file, and I did not find an example file to explain its
syntax.
Thanks,
A.
--
Ansgar Esztermann
Sysadmin
http://www.mpibpc.mpg.de/grubmueller/esztermann
1 year, 10 months