"F" == marra marra@irc.cnr.it writes:
Hi Franco,
F> Dear Roland, sorry for not having replied earlier: due to the F> COVID emergency, I was able to put my hands on the machine only F> today.
F> Good news: I was able to get the nfs filesystems on the nodes! F> Thank you so much for your help.
glad to hear that.
F> I think it can be useful to tell you about what I have done and F> what did not work. You were right about the entries in F> /etc/exports: no path was exported on the IB network even if this F> network was listed in the menu of qluman-qt Filesystem exports F> under the submenu Network priorities. Then I tried to copy and F> paste all the ethernet (Boot) exports and change the IP entries F> to reflect the IB network. After a restart of the full cluster (I F> was not sure about which services needed to be restarted) I got a F> different behavior trying to login on the nodes: the login was F> unsuccessful because of a timeout on the nfs mount of the home F> directory. By the way, I was unable to find this modification F> reflected in the qluman-qt entry "Network NFS mounts". Then I F> checked the entry NEED_RDMA="yes" in file F> /etc/default/nfs-kernel-server and it was missed even if the F> proper check was activated in qluman-qt . I did not find any way F> to edit this configuration within qluman-qt.
Please note that a lot of configurations on the head-node are not possible to do with the GUI. This is one of them.
F> At this point, I choose to avoid having nfs on IB, so I edited F> again the /etc/exports file deleting the entry about IB and in F> qluman-qt I deleted the reference to IB in the Network priorities F> sub-menu of the Filesystem exports menu. Et voilĂ , after a new F> restart of the cluster, users are now able to login to the F> compute nodes landing in their home directory.
F> Your suggestions were essential to solve this issue, thank you F> again. If I can be of any help for a better analysis of the F> reported behavior, do not hesitate to contact me.
F> A side info: I get continuous disconnections from the qluman-qt F> interface, while all connections to the server console or to the F> nodes are perfectly stable.
Could be slurm related. This might be fixed with an upcoming release.
Best,
Roland