1.安裝驅(qū)動
1.執(zhí)行lspci | grep -i nvidia命令查看當(dāng)前顯卡型號
[root@centos79-temp install]# lspci | grep -i nvidia
00:0a.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1)
- 進(jìn)入PCI devices (ucw.cz)網(wǎng)站,輸入上一步的顯示的數(shù)字2204,點(diǎn)擊搜索
image.png
可以看到對應(yīng)的顯卡是GeForce RTX 3090
image.png
- 進(jìn)入nvidia官網(wǎng)
https://www.nvidia.cn/Download/index.aspx?lang=cn -
選擇對應(yīng)的顯卡驅(qū)動,并下載
選擇顯卡驅(qū)動 - 禁用nouveau
nouveau是一個第三方開源的Nvidia驅(qū)動,一般Linux安裝的時候默認(rèn)會安裝這個驅(qū)動。 這個驅(qū)動會與Nvidia官方的驅(qū)動沖突,在安裝Nvidia驅(qū)動和和CUDA之前應(yīng)先禁用nouveau
lsmod | grep nouveau查看系統(tǒng)是否正在使用nouveau,如果有輸出,則按以下命令禁用:
#新建一個配置文件
vi /etc/modprobe.d/blacklist-nouveau.conf
#寫入以下內(nèi)容
blacklist nouveau
options nouveau modeset=0
#備份當(dāng)前的鏡像
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
#建立新的鏡像
dracut /boot/initramfs-$(uname -r).img $(uname -r)
#重啟
reboot
#最后輸入上面的命令驗(yàn)證
lsmod | grep nouveau
- 執(zhí)行安裝
# 賦執(zhí)行權(quán)限
chmod +x NVIDIA-Linux-x86_64-525.89.02.run
# 執(zhí)行安裝命令
./NVIDIA-Linux-x86_64-525.89.02.run -no-x-check -no-nouveau-check -no-opengl-files
-no-x-check:安裝驅(qū)動時關(guān)閉X服務(wù)
-no-nouveau-check:安裝驅(qū)動時禁用nouveau
-no-opengl-files:只安裝驅(qū)動文件,不安裝OpenGL文件
- 測試是否安裝成功
執(zhí)行nvidia-smi命令,如果執(zhí)行輸出以下信息,則表示安裝成功了
[root@centos79-temp install]# nvidia-smi
Tue Mar 7 15:10:54 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.89.02 Driver Version: 525.89.02 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:00:0A.0 Off | N/A |
| 35% 30C P0 N/A / 350W | 0MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
到此,宿主機(jī)的NVIDIA驅(qū)動就安裝完畢了,如果你不使用docker來運(yùn)行程序,現(xiàn)在已經(jīng)可以正常使用了。
可能遇到的錯誤:
- 宿主機(jī)未安裝
gcc
ERROR: Unable to find the development tool `cc` in your path; please make sure that you have the package 'gcc' installed. If gcc is installed on your system, then please check that `cc` is in your PATH.
執(zhí)行 yum install gcc gcc-c++ -y 安裝即可
- 缺少
kernel-devel
ERROR: Unable to find the kernel source tree for the currently running kernel. Please make sure you have installed the kernel source files for your kernel and that they are properly configured; on
Red Hat Linux systems, for example, be sure you have the 'kernel-source' or 'kernel-devel' RPM installed. If you know the correct kernel source files are installed, you may specify the
kernel source path with the '--kernel-source-path' command line option.
執(zhí)行 yum update && yum install kernel-devel -y 安裝即可,有時候安裝完還是報(bào)錯,那就要看內(nèi)核是否一致。
yum info kernel-devel kernel-headers 和 uname -r 對比一下內(nèi)核版本,內(nèi)核更新之后記得重啟一下服務(wù)器。
kernel-devel下載: https://pkgs.org/download/kernel-devel
kernel-headers下載: https://pkgs.org/download/kernel-headers


[root@localhost home]# ls
kernel-devel-3.10.0-1160.el7.x86_64.rpm kernel-headers-3.10.0-1160.el7.x86_64.rpm NVIDIA-Linux-x86_64-535.146.02.run
# 安裝包
[root@localhost home]# rpm -ivh *.rpm --nodeps --force
準(zhǔn)備中... ################################# [100%]
正在升級/安裝...
1:kernel-headers-3.10.0-1160.el7 ################################# [ 50%]
2:kernel-devel-3.10.0-1160.el7 ################################# [100%]
[root@i-hekarfs5 packages]# yum info kernel-devel kernel-headers
已加載插件:fastestmirror
Loading mirror speeds from cached hostfile
* base: mirrors.aliyun.com
* epel: mirrors.bfsu.edu.cn
* extras: mirrors.ustc.edu.cn
* updates: mirrors.aliyun.com
已安裝的軟件包
名稱 :kernel-devel
架構(gòu) :x86_64
版本 :3.10.0
發(fā)布 :1160.el7
大小 :38 M
源 :installed
來自源:updates
簡介 : Development package for building kernel modules to match the kernel
網(wǎng)址 :http://www.kernel.org/
協(xié)議 : GPLv2
描述 : This package provides kernel headers and makefiles sufficient to build modules
: against the kernel package.
名稱 :kernel-headers
架構(gòu) :x86_64
版本 :3.10.0
發(fā)布 :1160.el7
大小 :3.8 M
源 :installed
來自源:updates
簡介 : Header files for the Linux kernel for use by glibc
網(wǎng)址 :http://www.kernel.org/
協(xié)議 : GPLv2
描述 : Kernel-headers includes the C header files that specify the interface
: between the Linux kernel and userspace libraries and programs. The
: header files define structures and constants that are needed for
: building most standard programs and are also needed for rebuilding the
: glibc package.
[root@i-hekarfs5 packages]# uname -r
3.10.0-1160.el7.x86_64
2.安裝 nvidia-docker-runtime
1.根據(jù)官方文檔,執(zhí)行安裝命令
Migration Notice | nvidia-container-runtime
centos7 的添加方式為:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.repo | \
sudo tee /etc/yum.repos.d/nvidia-container-runtime.repo
ubuntu的添加方式為:
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
- 執(zhí)行安裝
yum install nvidia-container-runtime
- 運(yùn)行docker容器測試
docker run -it --rm --gpus all centos nvidia-smi
果然,不出意外的情況下,就要出意外了:
[root@centos79-temp install]# docker run -it --rm --gpus all centos nvidia-smi
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
原因是沒有重啟docker,執(zhí)行systemctl restart docker命令,再次測試就成功了
[root@centos79-temp install]# docker run -it --rm --gpus all centos nvidia-smi
Tue Mar 7 07:19:23 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.89.02 Driver Version: 525.89.02 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:00:0A.0 Off | N/A |
| 35% 30C P0 N/A / 350W | 0MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+


