Hi,
I have an HP Proliant DL380p (gen8) server that has been used as our head node for a Qlustar 10 cluster for several years now. We've recently decided to upgrade to Qlustar 12 (via a clean install) and I have been having issues with the installer not being able to find a disk for the installation. When I boot from USB the installer starts as expected, but once it loads I receive the message "No disks found - The installer ha not found any unused disks, installation cannot continue.". I've done some googling and found a few similar issues, mostly related to RAID, but no solution yet. The server does have a hardware RAID disk controller (HP P420i) and the target disk is a RAID 1 logical volume.
The odd thing is that dropping into a shell and using lsblk shows the target RAID volume and another on the same controller as /dev/sda1 and /dev/sdb1. The RAID partitions also appear in the output of `cat /dev/partitions`, and it appears that the hpsa module has loaded properly from the output of `lsmod | grep hpsa`. We've today updated the BIOS and RAID controller firmware on the server and are still having no luck with proceeding with the install (same error every time). I am tempted to test things with the installer for Qlustar 11, but it no longer appears to be available online. Any suggestions?
Thanks! Dave
p.s. I won't be able to have physical access to the cluster again until July 20, but can try to answer any questions that come up here in the meantime.
Hi,
david.whipp@helsinki.fi wrote:
The odd thing is that dropping into a shell and using lsblk shows the target RAID volume and another on the same controller as /dev/sda1 and /dev/sdb1. The RAID partitions also appear in the output of `cat /dev/partitions` ...
This is a problem for the installer. The running RAID blocks the two disks and makes them unusable for the installer. While the installer is smart enough to handle some cases, like Linux software raid it uses itself, it isn't smart enough to shutdown every raid implementation.
To make the installer work you have to disable the RAID or zero out the metadata (either at the start or end of the disk) so no raid is started. The disks should then appear in the installer. The installer needs to see a plain /dev/sda, /dev/sdb, ... instead of /dev/cciss/c0d0 and similar.
Hope that helps, Goswin von Brederlow
Hi!
This is a problem for the installer. The running RAID blocks the two disks and makes them unusable for the installer. While the installer is smart enough to handle some cases, like Linux software raid it uses itself, it isn't smart enough to shutdown every raid implementation.
OK, this is good to know, and sounds somewhat familiar based on some of the other threads in the email list here.
To make the installer work you have to disable the RAID or zero out the metadata (either at the start or end of the disk) so no raid is started. The disks should then appear in the installer. The installer needs to see a plain /dev/sda, /dev/sdb, ... instead of /dev/cciss/c0d0 and similar.
We can try this during the next few days and provide an update. As far as I understand, the disk controller (HP P420i) is RAID only, but perhaps it is OK to simply create a single-disk RAID-0 array and see whether the installer sees that. The interesting thing is that currently the RAID-1 array appears in the installer shell as /dev/sda. I read a bit about more advanced steps to turn off RAID on the controller, but it seems quite involved and not really something we want to try unless absolutely needed (e.g., https://systemausfall.org/wikis/howto/Disable%20HP%20Proliant%20Hardware-RAI...).
Do you happen to know whether there is an iso of the Qlustar 11 installer still available? We've installed things in the past with Qlustar 8 (I think), so I was a bit surprised the installer now has trouble with hardware RAID arrays.
Hope that helps,
Indeed it does. We'll update again soon.
Best, Dave
"D" == david whipp david.whipp@helsinki.fi writes:
Hi Dave,
when you work on this again, could you launch a shell while in the installer, execute the command /usr/lib/fai/fai-disk-info and post its output?
Thanks,
Roland
D> We can try this during the next few days and provide an D> update. As far as I understand, the disk controller (HP P420i) is D> RAID only, but perhaps it is OK to simply create a single-disk D> RAID-0 array and see whether the installer sees that. The D> interesting thing is that currently the RAID-1 array appears in D> the installer shell as /dev/sda.
Hi Roland,
I'm now in the data center for the morning for some testing. Here is the output from the /usr/lib/fai/fai-disk-info command:
# /usr/lib/fai/fai-disk-info sda sdb
This configuration is for two RAID arrays on the same controller (one RAID 1 pair for the OS disks - sda, another RAID 6 array for scratch storage - sdb). lsblk shows both arrays as expected. Any suggestions of other things to try, or additional output that would be helpful?
Best, Dave
"D" == david whipp david.whipp@helsinki.fi writes:
D> Hi Roland, I'm now in the data center for the morning for some D> testing. Here is the output from the /usr/lib/fai/fai-disk-info D> command:
D> # /usr/lib/fai/fai-disk-info D> sda sdb
Please execute
# cat /proc/mounts
and check whether /dev/sda* or /dev/sdb* are mounted. If so, please let me know.
In any case, you can try to brute-force wipe FS/partition/LVM info on the disk array by executing (data on /dev/sda will be unusable afterwards)
# dd if=/dev/zero of=/dev/sda bs=1M count=1024
and then try to reboot into the installer.
Hi again,
Please execute
# cat /proc/mounts
and check whether /dev/sda* or /dev/sdb* are mounted. If so, please let me know.
Here is the output of cat /proc/mounts:
# cat /proc/mounts
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0 sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0 devtmpfs /dev devtmpfs rw,nosuid,noexec,size=32897996k,nr_inodes=8224499,mode=755 0 0 devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0 tmpfs /run tmpfs rw,nosuid,nodev,noexec,mode=755 0 0 /dev/sdc1 /run/initramfs/live iso9660 ro,relatime,nojoliet,check=s,map=n,blocksize=2048 0 0 /dev/mapper/live-rw / ext4 rw,relatime 0 0 tmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0 tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0 tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,mode=755 0 0 cgroup2 /sys/fs/cgroup/unified cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate 0 0 cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,name=systemd 0 0 pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0 cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0 cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0 cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0 cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0 cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0 cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0 cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0 cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0 cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0 cgroup /sys/fs/cgroup/rdma cgroup rw,nosuid,nodev,noexec,relatime,rdma 0 0 systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=29,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=608 0 0 hugetlbfs /dev/hugepages hugetlbfs rw,relatime,pagesize=2M 0 0 mqueue /dev/mqueue mqueue rw,nosuid,nodev,noexec,relatime 0 0 debugfs /sys/kernel/debug debugfs rw,nosuid,nodev,noexec,relatime 0 0 fusectl /sys/fs/fuse/connections fusectl rw,nosuid,nodev,noexec,relatime 0 0
No sign of sda or sdb above.
In any case, you can try to brute-force wipe FS/partition/LVM info on the disk array by executing (data on /dev/sda will be unusable afterwards)
# dd if=/dev/zero of=/dev/sda bs=1M count=1024
and then try to reboot into the installer.
We've already deleted the RAID array earlier from these drives, so I suppose there is no harm in writing zeros to the whole thing. We'll try this next and update.
Best, Dave
Hi once again,
In any case, you can try to brute-force wipe FS/partition/LVM info on the disk array by executing (data on /dev/sda will be unusable afterwards)
# dd if=/dev/zero of=/dev/sda bs=1M count=1024
and then try to reboot into the installer.
We've already deleted the RAID array earlier from these drives, so I suppose there is no harm in writing zeros to the whole thing. We'll try this next and update.
No luck. Wrote zeros to the entire OS disk array and still there is no unused disk detected in the installer.
Best, Dave
I suggest to recreate the RAID 1 since breaking it didn't help. Once done, please post the output of the command # /var/lib/fai/config/QlustarInstaller/StorageManager.py
Hi,
I suggest to recreate the RAID 1 since breaking it didn't help. Once done, please post the output of the command # /var/lib/fai/config/QlustarInstaller/StorageManager.py
Here is the requested output:
# /var/lib/fai/config/QlustarInstaller/StorageManager.py Disk /dev/sda: 419.16 GiB, LOGICAL VOLUME
Disk /dev/sdb: 9.1 TiB, LOGICAL VOLUME
(note that I typed the output above in, so the formatting might differ).
Best, Dave
"D" == david whipp david.whipp@helsinki.fi writes:
D> Hi, >> I suggest to recreate the RAID 1 since breaking it didn't >> help. Once done, please post the output of the command >> # /var/lib/fai/config/QlustarInstaller/StorageManager.py
D> Here is the requested output:
D> # /var/lib/fai/config/QlustarInstaller/StorageManager.py D> Disk /dev/sda: 419.16 GiB, LOGICAL VOLUME
D> Disk /dev/sdb: 9.1 TiB, LOGICAL VOLUME
D> (note that I typed the output above in, so the formatting might D> differ).
Hmm, this is the expected result which is used as input for the list of disks in the corresponding "Disks, Partitions and Filesystems" window of the installer. Can you please post a photo/screenshot of this window? I see no more reason, why the disks shouldn't appear there.
Hi,
Hmm, this is the expected result which is used as input for the list of disks in the corresponding "Disks, Partitions and Filesystems" window of the installer. Can you please post a photo/screenshot of this window? I see no more reason, why the disks shouldn't appear there.
Unfortunately we don’t get as far as seeing the “Disks, Partitions and Filesystems” window, as the installer immediately shows no unused disks are available after trying to start the install. Here is the set of photos for each step: Starting the installer, starting the install, and the message stating installation cannot continue. I also recorded the boot process for the installer (see https://youtu.be/oniQac6bObc). It seems perhaps the SATA link goes down (see 21 seconds into video), but I don’t know whether that could be the issue.
[cid:C6A49A03-731B-47F9-BFB6-DD3B0534FEB9@eduroam.helsinki.fi]
[cid:BF7A85A5-1444-4815-992A-343FCE999633@eduroam.helsinki.fi]
[cid:F983D6EC-3A03-4A43-BE16-92A60384BD6C@eduroam.helsinki.fi]
Best, Dave
Hi Roland,
Just FYI, I have replied via email to the last message with a few attached photos. It appears the message required moderation due to the attachments, but once it goes through you should see the photos of the installation process and where we get stuck.
Best, Dave
Hi again Roland,
Hmm, this is the expected result which is used as input for the list of disks in the corresponding "Disks, Partitions and Filesystems" window of the installer. Can you please post a photo/screenshot of this window? I see no more reason, why the disks shouldn't appear there.
It seems the message I sent yesterday is still stuck awaiting moderator approval. However, I can provide a link to a video of the installer boot process that might be helpful.
Unfortunately we don’t get as far as seeing the “Disks, Partitions and Filesystems” window, as the installer immediately shows no unused disks are available after trying to start the install. I recorded the boot process for the installer (see https://youtu.be/oniQac6bObc). It seems perhaps the SATA link goes down (see 21 seconds into video), but I don’t know whether that could be the issue.
Best, Dave
"D" == david whipp david.whipp@helsinki.fi writes:
Hi Dave,
D> Hi again Roland, >> Hmm, this is the expected result which is used as input for the >> list of disks in the corresponding "Disks, Partitions and >> Filesystems" window of the installer. Can you please post a >> photo/screenshot of this window? I see no more reason, why the >> disks shouldn't appear there.
D> It seems the message I sent yesterday is still stuck awaiting D> moderator approval. However, I can provide a link to a video of D> the installer boot process that might be helpful.
D> Unfortunately we don’t get as far as seeing the “Disks, D> Partitions and Filesystems” window, as the installer immediately D> shows no unused disks are available after trying to start the D> install. I recorded the boot process for the installer (see D> https://youtu.be/oniQac6bObc). It seems perhaps the SATA link D> goes down (see 21 seconds into video), but I don’t know whether D> that could be the issue.
yes, that seems to be the issue. The disks are scanned at installer startup time and they don't seem to be initialized at that time yet. We're preparing a new installer that waits up to 60 seconds for disks to appear. Will post here once it's tested and uploaded.
Best,
Roland
Hi,
Just another quick update.
To make the installer work you have to disable the RAID or zero out the metadata (either at the start or end of the disk) so no raid is started. The disks should then appear in the installer. The installer needs to see a plain /dev/sda, /dev/sdb, ... instead of /dev/cciss/c0d0 and similar.
We can try this during the next few days and provide an update. As far as I understand, the disk controller (HP P420i) is RAID only, but perhaps it is OK to simply create a single-disk RAID-0 array and see whether the installer sees that. The interesting thing is that currently the RAID-1 array appears in the installer shell as /dev/sda. I read a bit about more advanced steps to turn off RAID on the controller, but it seems quite involved and not really something we want to try unless absolutely needed (e.g., https://systemausfall.org/wikis/howto/Disable%20HP%20Proliant%20Hardware-...).
Deleting the RAID-1 array for the operating system disks and setting the disks up as single-disk RAID-0 arrays did not help. The installer still fails to find usable disks even though the RAID-1 or RAID-0 disks appear as /dev/sda. Not sure where to go from here...
Best, Dave