Dear Roland,

firstly I like to thank you so much for your kind and detailed answer, that allowed me to understand more details about this very nice Qlustar distribution and to investigate better my issue. I hope I will not bore you too much with my reply.

When I try to login as a simple user from the frontend to a compute node (standard-node) I get a password request and the nfs directories are not mounted. These are the relevant lines of the output of the journalctl -xe command:

Jul 25 12:16:34 HP4 systemd[1]: data-home.automount: Got automount request for /data/home, triggered by 38058 (sshd)
Jul 25 12:16:34 HP4 systemd[1]: Mounting Mount point /data/home...
Jul 25 12:16:34 HP4 kernel: RPC: Registered rdma transport module.
Jul 25 12:16:34 HP4 kernel: RPC: Registered rdma backchannel transport module.
Jul 25 12:16:34 HP4 mount[38060]: mount.nfs: access denied by server while mounting beosrv-ib:/srv/data/home
Jul 25 12:16:34 HP4 systemd[1]: data-home.mount: Mount process exited, code=exited status=32
Jul 25 12:16:34 HP4 systemd[1]: data-home.mount: Failed with result 'exit-code'.
Jul 25 12:16:34 HP4 systemd[1]: Failed to mount Mount point /data/home.
Jul 25 12:16:34 HP4 systemd[1]: data-home.automount: Got automount request for /data/home, triggered by 38058 (sshd)
Jul 25 12:16:34 HP4 systemd[1]: Mounting Mount point /data/home...
Jul 25 12:16:34 HP4 mount[38065]: mount.nfs: access denied by server while mounting beosrv-ib:/srv/data/home
Jul 25 12:16:34 HP4 systemd[1]: data-home.mount: Mount process exited, code=exited status=32
Jul 25 12:16:34 HP4 systemd[1]: data-home.mount: Failed with result 'exit-code'.
Jul 25 12:16:34 HP4 systemd[1]: Failed to mount Mount point /data/home.
Jul 25 12:16:34 HP4 systemd[1]: data-home.automount: Got automount request for /data/home, triggered by 38058 (sshd)
Jul 25 12:16:34 HP4 systemd[1]: Mounting Mount point /data/home...
Jul 25 12:16:34 HP4 mount[38067]: mount.nfs: access denied by server while mounting beosrv-ib:/srv/data/home
Jul 25 12:16:34 HP4 systemd[1]: data-home.mount: Mount process exited, code=exited status=32
Jul 25 12:16:34 HP4 systemd[1]: data-home.mount: Failed with result 'exit-code'.
Jul 25 12:16:34 HP4 systemd[1]: Failed to mount Mount point /data/home.
Jul 25 12:16:34 HP4 sshd[38058]: Rhosts authentication refused for marra: no home directory /data/home/marra

From this I understand that the nfs request is for the Infiniband interface of the headnode. However, I have a virtual front end that miss an IB interface, and the Filesystem exports config looks like:

Name: Home
Server: beosrv-c
Export Path: /srv/data/home
Network priorities: Boot IB

I do not know how the priority is exactly managed, but the mount command on the VM-FE show me:

beosrv-c:/srv/data/home on /data/home type nfs ...

so I am sure the home directory is mounted on the ethernet network. Is it normal to mix mounting options for FE and standard nodes?

The network FS Mounts dialog for the same directory in Qluman-qt looks like:

Resource: Home
Export Path: /srtv/data/home [blank]
[  ] Override Network: (grayed Boot)
[X] Allow RDMA

The preview config of the node HP4 shows me some alerts (red dots or green/red dots) for the following points:

(RED/GREEN) /etc -> (RED) Network -> (RED) interfaces.d/qluman   (I suppose this is not relevant to my issue)
##################################################################
#-------------   File is auto-generated by Qluman!  -------------#
#-------------  Manual changes will be overwritten! -------------#
#----------------------------------------------------------------#

auto BOOT
iface BOOT inet dhcp
  metric 10

auto ib0
iface ib0 inet static
  address 192.168.53.104
  netmask 24
  pre-up /lib/qlustar/ib-initialize

(RED/GREEN) /etc -> (RED/GREEN) qlustar -> (RED) Disk config
# ZFS config for single disk (/dev/sda):
#   Zpool name: SYS
#   8GB zvol for swap (not activated)
#   Filesystems: /var (max 2GB) + /scratch - both compressed

[BASE]
ZPOOLS = SYS
ZFS = var, scratch
#ARC_LIMIT = 1024
#ZVOLS = swap

[SYS]
vdevs = V-SYS

[V-SYS]
devs = /dev/sda
type =

[swap]
zpool = SYS
size = 8G

[var]
zpool = SYS
quota = 20G
reservation = 20G
compress = lz4

[scratch]
zpool = SYS
compress = lz4

(RED/GREEN) sysconfig -> (RED/GREEN) network-scripts -> (RED) ifcfg-BOOT

##################################################################
#-------------   File is auto-generated by Qluman!  -------------#
#-------------  Manual changes will be overwritten! -------------#
#----------------------------------------------------------------#
DEVICE=BOOT
BOOTPROTO=dhcp
ONBOOT=yes
TYPE=Ethernet
HWADDR=a0:d3:c1:fd:9c:a8


Maybe a solution could be to delete the IB network from the config of the Filesystem exports so to be sure to be consistent with the network protocol both for the VM-FE and the nodes.

If you have time to give me your hints, I will really appreciate your help.

Thank you and best regards,

Franco


Il 24/07/2020 15:04, Roland Fehrenbacher ha scritto:
"F" == marra  <marra@irc.cnr.it> writes:
Hi Franco,

    F> Dear Qlustar experts, after a successful installation of the
    F> latest version of Qlustar (ver 11.0.1), I am facing a problem
    F> with the setup of computing nodes: they do not mount the nfs
    F> directories for apps and home.  To me, everything looks correctly
    F> set up (checking with the qluman-qt application), Filesystem
    F> Exports and Network FS Mounts look correctly defined and assigned
    F> to the configs of the nodes.

have you tried to do an ls on them? They are automount units and will
only mount once accessed. 

    F> The only signal that something in the configuration of nodes is
    F> maybe not correct is an empty /etc/fstab, but probably the mount
    F> mechanism is different from standard mounts.

Yes it is, it uses systemd mount units. You can check the systemd config
files for these mount units using the context menu (right mouse button)
of a node in the QluMan GUI's Enclosure view and select the entry "Preview
config". In the displayed filesystem tree look for the *.mount files and
select them. From their content you will be able to see the arguments of
the corresponding mount command. On the net-booted node itself you can
look into the systemd journal using 'journalctl -xe' and check what
error (if any) you have when the mount is activated.

Best,

Roland

    F> What happens is that doing ssh to a compute node nor the apps nor
    F> the home directories are found. With the previous version 10.0 I
    F> did not have (on a different older cluster without IB) this
    F> issue.

    F> Please, can you give me some hints about what I have to check to
    F> fix this issue?

    F> Thank you in advance. Best regards,

    F> Franco _

-- 
+------------------------------------------------+--------------------+
| Francesco Saverio Marra, PhD                   |                    |
| Istituto di Ricerche sulla Combustione - CNR   |                    |
| via Diocleziano, 328  -  80124 Napoli, ITALY   |                    |
| tel.   +39 081 5704105  (int. 231)             |                    |
| fax    +39 081 7622915                         |                    |
| e-mail marra@irc.cnr.it                        |                    |
+------------------------------------------------+--------------------+