Dear members,
I just test transition from rocks cluster to qlustar.
one point is our tesla m2075 nodes wont boot with nvidia driver due to not supportet by ubuntou-focal-nvidea. <snip> [ 60.417821] NVRM: The NVIDIA Tesla M2075 GPU installed in this system is NVRM: supported through the NVIDIA 390.xx Legacy drivers. Please NVRM: visit http://www.nvidia.com/object/unix.html for more NVRM: information. The 470.74 NVIDIA driver will ignore NVRM: this GPU. Continuing probe... [ 60.418062] NVRM: No NVIDIA GPU found. [ 60.418406] nvidia-nvlink: Unregistered the Nvlink Core, major device number 245 [ 65.107363] nvidia-nvlink: Nvlink Core is being initialized, major device number 245
<snap> so how can i achieve this on head-node to assemble a new boot image with older driver ?
fail with this one: NVIDIA-Linux-x86_64-352.79.run
nvidia-installer log file '/var/log/nvidia-installer.log' creation time: Sun Feb 20 13:33:54 2022 installer version: 352.79
PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
nvidia-installer command line: ./nvidia-installer
Unable to load: nvidia-installer ncurses user interface
Using built-in stream user interface -> Detected 4 CPUs online; setting concurrency level to 4. WARNING: You do not appear to have an NVIDIA GPU supported by the 352.79 NVIDIA Linux graphics driver installed in this system. For further details, please see the appendix SUPPORTED NVIDIA GRAPHICS CHIPS in the README available on the Linux driver download page at www.nvidia.com. -> License accepted. -> Installing NVIDIA driver version 352.79. -> Performing CC sanity check with CC="/usr/bin/cc". -> Kernel source path: '/lib/modules/5.4.167-ql-generic-12.0-13/build' -> Kernel output path: '/lib/modules/5.4.167-ql-generic-12.0-13/build' -> Performing rivafb check. -> Performing nvidiafb check. -> Performing Xen check. -> Performing PREEMPT_RT check. -> Cleaning kernel module build directory. executing: 'cd ./kernel; /usr/bin/make clean'... -> Building NVIDIA kernel module: executing: 'cd ./kernel; /usr/bin/make module SYSSRC=/lib/modules/5.4.167-ql-generic-12.0-13/build SYSOUT=/lib/modules/5.4.167-ql-generic-12.0-13/build -j4 NV_BUILD_MODULE_INSTANCES='... NVIDIA: calling KBUILD... make[1]: Entering directory '/usr/src/linux-headers-5.4.167-ql-generic-12.0-13' /usr/bin/make -f ./Makefile syncconfig /usr/bin/make -f ./scripts/Makefile.build obj=scripts/basic rm -f .tmp_quiet_recordmcount /usr/bin/make -f ./scripts/Makefile.build obj=scripts/kconfig syncconfig scripts/kconfig/conf --syncconfig Kconfig Kconfig:34: can't open file "Documentation/Kconfig" make[3]: *** [scripts/kconfig/Makefile:73: syncconfig] Error 1 make[2]: *** [Makefile:590: syncconfig] Error 2 make[1]: *** [Makefile:696: include/config/auto.conf.cmd] Error 2 make[1]: Leaving directory '/usr/src/linux-headers-5.4.167-ql-generic-12.0-13' NVIDIA: left KBUILD. nvidia.ko failed to build! make: *** [nvidia-modules-common.mk:245: module] Error 1 -> Error. ERROR: Unable to build the NVIDIA kernel module. ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.