I have set up a small cluster with a Master and 2 compute nodes. The Master has public and private Ethernet interfaces. The public interface is the connection to the internet. The private interface is the connection with the compute nodes.
I installed Qlustar 11.0.1-0 on the Master, which completed with no issue. I had to restart the network services with
systemctl restart systemd-resolved.service
after the second re-boot during installation, and then proceeded with the First Steps checklist. I did not configure a Demo Cluster (Step 1.3) and I did not install additional software (Step 1.5). First Steps went fine up until step 1.6 "Running the Cluster Manager QluMan". Using the Head Node, I obtained a one-time token and saved it to /media/token (Step 1.6.1). When I attempted to install QluMan (Step 1.6.2), the review of packages that appeared on the monitor was normal looking, but the installation of each package failed. Failure indication was the same for each package to be installed; There was a notice that started with "Failed to fetch http://repo.qlustar.com/repo/ubuntu/pool/ ..." and ended with "... temporary failure resolving "repo.qlustar.com".
I did a little communication check of the public network: ping www.google.com produced "Temporary failure in name resolution" ping 8.8.8.8 produced a steady stream of reports of "64 bytes from 8.8.8.8: icmp_seq=<n> tt1=116 time = 78.2 ms", with <n> incrementing each time.
I also note that when I power up the 2 compute nodes after the successful Qlustar installation on the Master, there is no booting going on, which I presume may be related to whatever has made the Master unable to resolve network ip addresses and names.
I am a CS student working a project that required a small cluster computer, so I have no particular skill with the technology here. I spent the better part of the day trying to figure out what I'm doing wrong, but I have failed. The issue is reproducible. I have done multiple installs from scratch and always get the same result. I note that the Qlustar OS installation and the early First Steps require access to the Qlustar.com repo, and that the "Final Reboot" at Step 1.2 appears to be when the Master node loses its ability to resolve network addresses and names and becomes unable to access the Qlustar.com repo. Do any of the kind listeners recognize my error or perhaps could recommend some sequence of steps I might take to isolate the problem?
Kim
"K" == Kim Peterson kimjohnpeterson@gmail.com writes:
Hi Kim,
K> Failure indication was the K> same for each package to be installed; There was a notice that K> started with "Failed to fetch K> http://repo.qlustar.com/repo/ubuntu/pool/ ..." and ended with K> "... temporary failure resolving "repo.qlustar.com".
K> I did a little communication check of the public network: ping K> www.google.com produced "Temporary failure in name resolution" K> ping 8.8.8.8 produced a steady stream of reports of "64 bytes K> from 8.8.8.8: icmp_seq=<n> tt1=116 time = 78.2 ms", with <n> K> incrementing each time.
this indicates that your basic networking is fine, but DNS is not working. Are you sure that the DNS server IP you entered during installation is correct? Can you ping the IP of the DNS server you want to use?
K> I also note that when I power up the 2 compute nodes after the K> successful Qlustar installation on the Master, there is no K> booting going on, which I presume may be related to whatever has K> made the Master unable to resolve network ip addresses and names.
Nodes other than a pre-selected FE VM or Demo VMs cannot boot without manually registering them.
Best,
Roland
Roland
Thank you for the quick and excellent reply!
Reference your comment, "Are you sure that the DNS server IP you entered during installation is correct?"; Since I have had prior experience installing Ubuntu 18.04 LTS and knew from that experience that the network connection simply works without requiring a manual setup, I set up the Public Network during Qlustar installation using DHCP, specifically so it would automatically be correct. Perhaps my installation approach here is wrong? One of the goals of my academic project is to demonstrate that a small, personal use HPC cluster can be set up and managed with ease. I was delighted to find Qlustar during my initial investigation, because it looks like the perfect platform for that part of my project. It appears from my experience here that my approach of installing Qlustar directly on the Head node of a small cluster is not the simplest way of installing and using Qlustar. The Install Guide has no step by step instruction for manually setting up the Public Network, and reading through the Install Guide again, I see the text and notes in Chapter 3 Step 9 "Additional Settings" appear to indicate that setting up a Virtual Front End as a VM on the Head node is a preferred setup. Can you advise me on what would be the simplest way to set up Qlustar on a physical arrangement of 1 Head node and 4 Compute nodes? I would be eager to try whatever you might suggest.
Kim
Roland
I have attempted to investigate the reason for DNS not working. In my original description of the problem, I mention that after installing Qlustar I had to restart the network with
systemctl restart systemd-resolved.service
I did not recall seeing a step to restart the network service like that described in the Install Guide, so I went back and installed Qlustar again to examine what I was doing that might have made that instruction necessary. Here is what I found; The first instruction in the First Steps Guide (Chapter 1, Section 1.1) is to log in as root and start the post install configuration with the command
/usr/sbin/qlustar-initial-config
When I do that I get an error message that says, "The hostname cl-head is not externally resolvable. You might want to register it with your DNS server. Adding it to /etc/hosts." Well, I see that error message described exactly in the First Steps Guide right after that initial setup instruction. The Guide says, "If your chosen hostname can't be resolved via DNS, you will see a non-fatal error message reminding you that the hostname should be registered in some (external) name service (typically DNS)." So, it looks like all of the troubles I experience later on is caused by me incorrectly forcing the network to restart, rather than fixing the hostname registration at that point in the First Steps procedure. I am unfamiliar with setting up or managing networks generally, and I assume that someone that has some expertise in that area would know instantly what to do when they see that error message, but I do not. My attempts to find a procedure for that on the internet, and execute it, have been unsuccessful. Can you describe a step by step procedure similar to the other procedures described in the Install Guide and First Steps Guide, that would get me through registering a hostname in an external DNS?
Regards, Kim
P.S. I have also tried installing Qlustar with a virtual Front End head node, and (obviously) experienced similar network failure difficulties with those attempts as well. - KP
"K" == Kim Peterson kimjohnpeterson@gmail.com writes:
Hi Kim,
K> /usr/sbin/qlustar-initial-config
K> When I do that I get an error message that says, "The K> hostname cl-head is not externally resolvable. You might K> want to register it with your DNS server. Adding it to K> /etc/hosts."
that message is normal and doesn't make the rest of the installation fail. Registering the head-node with an external DNS is nice but not mandatory. You'll have to look further, this is not your problem.
Before you run qlustar-initial-config, can you resolve DNS addresses?
Test with: $ dig repo.qlustar.com
If yes, and you get failures when installing packages during the qlustar-initial-config phase, you might be sitting behind an http proxy and need to tell the installer your proxy data.
Best,
Roland
Roland,
I agree that registering the hostname is not the problem. I have monitored the control GUI for my router and witnessed the Qlustar Head node name being automatically registered by the router when the computer reboots after Qlustar install. I tested with dig repo.qlustar.com as you suggest. First test is immediately after a fresh install and login as root. Result is:
connection timed out; no servers could be reached
The /usr/sbin/qlustar-initial-config script fails every call to the repo. If I then start the network service with;
service systemd-resolved start
Then the dig repo.glustar.com test succeeds. Here is an excerpt from the output:
Got Answer: Header opcode: Query, status=NoError, id: 57624 . . . repo.qlustar.com 0 IN A 192.168.55.113 Server: 127.0.0.53#53(127.0.0.53) . Msg Size: recvd: 61
So, that communication looks fine, and if I then proceed with the initial config script, it runs to completion with no errors, and no faults. There is a message at the end of the initial config script that the computer needs to be rebooted to proceed. I have learned from experience that if I reboot the computer at that point, there is no internet connection after the reboot, and I have not been able to restore that. I can't proceed with getting the one-time token and installing qluman-qt. I will read up on http proxy thing and if I find evidence of that being in place, I'll include that info in a re-install and see what we get. I will also see if I can ctl-c exit out of the "Press Return to Reboot" instruction at the end of the initial-config script, then try getting the one-time token, and get QluMan GUI installed while I still have internet connection, and then doing the reboot. Maybe that will get me a little further down the road. Thank you for the patience with whatever it is that is putting a wrench in the works here. You are kind to hang in there with me.
Kim
"K" == Kim Peterson kimjohnpeterson@gmail.com writes:
Hi Kim,
K> Roland, I agree that registering the hostname is not the problem. K> I have monitored the control GUI for my router and witnessed the K> Qlustar Head node name being automatically registered by the K> router when the computer reboots after Qlustar install. I tested K> with dig repo.qlustar.com as you suggest. First test is K> immediately after a fresh install and login as root. Result is:
K> connection timed out; no servers could be reached
without DNS working you don't need to go any further with Qlustar installation than this.
$ dig repo.qlustar.com
has to work without any further action at this stage. You'll have to investigate why it's not. Can you ping the IP of the DNS server you specified during installation at this stage?
Best,
Roland
Roland
Earlier attempts to install Qlustar using a manual setup method have all failed, so I have been installing Qlustar with the Public Network set up using DHCP for the past 2 weeks, and I have not specified a DNS server. To be able to answer your question, I just performed another install with a manual setup for the Public Network using: ip address 192.168.0.200 (having learned through trial and error how to avoid the address range that is used by the router for DHCP addressing), Network Mask: 255.255.255.0, Gateway 192.168.0.1 (address of my router that acts as Gateway), DNS 192.168.0.1 (address of my router, which provides DNS service when DHCP is selected). This attempt has succeeded. I proceeded to run the /usr/sbin/qlustar-initial-config script, which ran to completion with no error. I performed the re-boot as instructed in the First Steps Guide, and proceeded with retrieving a One Time Token (Section 1.6), which executed successfully. I installed the QluMan GUI (Section 1.6.2). using the instruction
apt install qluman-qt
which executed without error. I attempted to open the QluMan GUI with the instruction qluman-qt and the instruction qluman-qt &, but both of those instructions result in the response:
cannot connect to x server
I quickly scanned the QluMan Guide to see if there was an instruction there for how to open the QluMan GUI, but didn't see one. How about one more lifeline and tell me how I open the QluMan GUI that is installed on a Master node and Front End for a small 4-node cluster, or, if you can tell that I have installed it incorrectly, tell me how to install the QluMan GUI and then how do I open it? Your patience has served me very well. Thank you for this help. I think it is a little unfortunate that installing Qlustar with DHCP selected for the Public network doesn't work for my hardware setup. I will definitely try again when Qlustar releases v12 using Ubuntu 20.04. You noted in your first response that I will need to manually register the nodes in order to get them to boot. I will attempt to do that once I get a functioning GUI to work from. If I have trouble, I'll open another mail discussion. My academic project is to create a Singularity container that sets up an HPC cluster from a small number of Ethernet connected computers, and allows an MPI enabled program to run (also from inside the container). I'll initially try to do that as simply as possible from scratch, and then building on that, I'll attempt to construct a container that has Qlustar as the container OS. Do you have advice on how I might go about that, or is it not possible to do? Thanks again for the help. I look forward to learning how to use GluMan (starting with getting it to open up!).
regards, Kim
edit comment; change ",,,how to use GluMan ..." in the last sentence to "...how to use the QluMan GUI..." --KP
Roland
In my attempt to resolve the "cannot connect to x server" issue that I get when attempting to open qluman-qt, I have re-read the Install Guide to see if I might be doing something wrong. I see the comment in Chapter 3 Step 9: "... we suggest to run only system-related tasks directly on the cluster's head-node(s) and have physical or virtual front-end (FE) nodes for user access/activity. In this dialog you can choose to setup a virtual FE node as a virtual machine running on the head-node. If you decide not to do so, we advise to assign a physical machine as a FE node at a later stage."
I have attempted to implement the suggested approach of setting up a physical Front End (I have prepared both a desktop and a laptop computer, in my case). I have followed the instructions in the First Steps Guide Section 1.6.3, and then specifically Section 1.6.3.2. I have installed Singularity v3.6.0 on my desktop and laptop. I have downloaded the most current QluMan GUI Singularity container (appears to be version 11.0.4.2), and verifies that it runs on both the laptop and desktop using the command singularity exec qluman-11.0.4.2-singularity.sqsh qluman-qt
or when operating from the directory where I have the container saved, with the command ./qluman-11.0.4.2-singularity.sqsh
The QluMan GUI opens and presents the Qlustar Management Interface window, with a Connect Cluster dialog open and active. That all looks great, but I expected to see a way to enter the One Time Token, and then have access to the Management Interface to allow me to go through the remainder of the First Steps Guide. I note that there is no description that I have seen for connecting the physically separate Front End to the cluster. Please verify that the Front End computer should just be connected to the Private Ethernet network that is set up for the cluster, and that I can make that physical connection at this point in the First Steps sequence; specifically I have the Head Node running with Qlustar OS installed and I have the Front End powered up with the QluMan GUI Singularity container installed and running.
Also, verify that I should download the One Time Token on the Front End computer and not on the Head Node computer when I am setting up to operate with a physically separate Front End.
I'll give that a try this afternoon and report what luck.
KIm
Roland
A follow up report: I was able to retrieve a One Time Token on the Head Node and move it to the Front End (my desktop running Linux Mint 20-04, with Singularity 3.6.0 installed and the qluman version 11.0.4.2 Singularity container loaded and open). I fumbled my way through creating a new cluster, providing the One Time Token when that was needed and the PIN when that was needed. The QluMan GUI (Qlustar Management Interface) detected the Head node and populated appropriate fields in the new cluster dialog without input from me. I currently have the QluMan GUI open, showing a single tab with title "admin@clementine", populated with the Enclosure View showing Rack 1/2, Blade 1/2 and then a green dot with "beosrv-c"; all arranged in a tree type depiction. The Client ID at the bottom of the open panel is populated with what appears to be a MAC address, and the Connection status shows a green dot next to "Qlumand". There is also a message box with another green dot, and indicating "10 messages". Interestingly, I believe the QluMan GUI on my desktop is connecting to my wireless router, then from there to the Head Node over its wired connection to that wireless router (the Public Network connection for the Head Node). I was planning to plug in a wired connection from my Desktop (Front End) to the Private Network switch after I had successfully made my way through the One Time Token gate, but then I found that a connection was already established. Let me know if I should change that to a wired connection from my Desktop to the Head Node on the Private Network.
I will start making my way through the remainder of The First Steps Guide (Section 1.7 and on). I am still interested to know how to run qluman-qt on the Head Node, if you happen to know, but I am content to operate the Clementine HPC cluster from my desktop using the qluman Singularity container.
I'll post a new email if I get stuck on something else. Thanks again for all the help.
Kim
"K" == Kim Peterson kimjohnpeterson@gmail.com writes:
Hi Kim,
K> Let me know if I should K> change that to a wired connection from my Desktop to the Head K> Node on the Private Network.
Not necessary.
K> I will start making my way through the remainder of The First K> Steps Guide (Section 1.7 and on). I am still interested to know K> how to run qluman-qt on the Head Node, if you happen to know, but K> I am content to operate the Clementine HPC cluster from my K> desktop using the qluman Singularity container.
Running with singularity is the best way. You need to read up on X-windows basics, then you will learn that the DISPLAY needs to be set in order to run remote (as on your head) graphical apps like qluman-qt.
K> I'll post a new email if I get stuck on something else. Thanks K> again for all the help.
You're welcome.
Good luck,
Roland