centos中docker使用GPU

1.安裝驅(qū)動

1.執(zhí)行lspci | grep -i nvidia命令查看當(dāng)前顯卡型號

[root@centos79-temp install]# lspci | grep -i nvidia
00:0a.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1)
  1. 進(jìn)入PCI devices (ucw.cz)網(wǎng)站,輸入上一步的顯示的數(shù)字2204,點(diǎn)擊搜索
    image.png

    可以看到對應(yīng)的顯卡是GeForce RTX 3090
    image.png
  1. 進(jìn)入nvidia官網(wǎng)
    https://www.nvidia.cn/Download/index.aspx?lang=cn
  2. 選擇對應(yīng)的顯卡驅(qū)動,并下載


    選擇顯卡驅(qū)動
  3. 禁用nouveau

nouveau是一個第三方開源的Nvidia驅(qū)動,一般Linux安裝的時候默認(rèn)會安裝這個驅(qū)動。 這個驅(qū)動會與Nvidia官方的驅(qū)動沖突,在安裝Nvidia驅(qū)動和和CUDA之前應(yīng)先禁用nouveau

lsmod | grep nouveau查看系統(tǒng)是否正在使用nouveau,如果有輸出,則按以下命令禁用:

#新建一個配置文件
vi /etc/modprobe.d/blacklist-nouveau.conf
#寫入以下內(nèi)容
blacklist nouveau
options nouveau modeset=0

#備份當(dāng)前的鏡像
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
#建立新的鏡像
dracut /boot/initramfs-$(uname -r).img $(uname -r)
#重啟
reboot
#最后輸入上面的命令驗(yàn)證
lsmod | grep nouveau
  1. 執(zhí)行安裝
# 賦執(zhí)行權(quán)限
chmod +x NVIDIA-Linux-x86_64-525.89.02.run
# 執(zhí)行安裝命令
./NVIDIA-Linux-x86_64-525.89.02.run -no-x-check -no-nouveau-check -no-opengl-files

-no-x-check:安裝驅(qū)動時關(guān)閉X服務(wù)
-no-nouveau-check:安裝驅(qū)動時禁用nouveau
-no-opengl-files:只安裝驅(qū)動文件,不安裝OpenGL文件

  1. 測試是否安裝成功
    執(zhí)行nvidia-smi命令,如果執(zhí)行輸出以下信息,則表示安裝成功了
[root@centos79-temp install]# nvidia-smi 
Tue Mar  7 15:10:54 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.89.02    Driver Version: 525.89.02    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:00:0A.0 Off |                  N/A |
| 35%   30C    P0    N/A / 350W |      0MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

到此,宿主機(jī)的NVIDIA驅(qū)動就安裝完畢了,如果你不使用docker來運(yùn)行程序,現(xiàn)在已經(jīng)可以正常使用了。

可能遇到的錯誤:

  1. 宿主機(jī)未安裝 gcc
ERROR: Unable to find the development tool `cc` in your path; please make sure that you have the package 'gcc'  installed.  If gcc is installed on your system, then please check that `cc` is in your PATH.  

執(zhí)行 yum install gcc gcc-c++ -y 安裝即可

  1. 缺少 kernel-devel
ERROR: Unable to find the kernel source tree for the currently running kernel.  Please make sure you have installed the kernel source files for your kernel and that they are properly configured; on
         Red Hat Linux systems, for example, be sure you have the 'kernel-source' or 'kernel-devel' RPM installed.  If you know the correct kernel source files are installed, you may specify the
         kernel source path with the '--kernel-source-path' command line option.

執(zhí)行 yum update && yum install kernel-devel -y 安裝即可,有時候安裝完還是報(bào)錯,那就要看內(nèi)核是否一致。
yum info kernel-devel kernel-headersuname -r 對比一下內(nèi)核版本,內(nèi)核更新之后記得重啟一下服務(wù)器。

kernel-devel下載: https://pkgs.org/download/kernel-devel
kernel-headers下載: https://pkgs.org/download/kernel-headers

kernel-devel
kernel-headers
[root@localhost home]# ls
kernel-devel-3.10.0-1160.el7.x86_64.rpm  kernel-headers-3.10.0-1160.el7.x86_64.rpm  NVIDIA-Linux-x86_64-535.146.02.run
# 安裝包
[root@localhost home]# rpm -ivh *.rpm --nodeps --force
準(zhǔn)備中...                          ################################# [100%]
正在升級/安裝...
   1:kernel-headers-3.10.0-1160.el7   ################################# [ 50%]
   2:kernel-devel-3.10.0-1160.el7     ################################# [100%]
[root@i-hekarfs5 packages]# yum info kernel-devel kernel-headers
已加載插件:fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.aliyun.com
 * epel: mirrors.bfsu.edu.cn
 * extras: mirrors.ustc.edu.cn
 * updates: mirrors.aliyun.com
已安裝的軟件包
名稱    :kernel-devel
架構(gòu)    :x86_64
版本    :3.10.0
發(fā)布    :1160.el7
大小    :38 M
源    :installed
來自源:updates
簡介    : Development package for building kernel modules to match the kernel
網(wǎng)址    :http://www.kernel.org/
協(xié)議    : GPLv2
描述    : This package provides kernel headers and makefiles sufficient to build modules
         : against the kernel package.

名稱    :kernel-headers
架構(gòu)    :x86_64
版本    :3.10.0
發(fā)布    :1160.el7
大小    :3.8 M
源    :installed
來自源:updates
簡介    : Header files for the Linux kernel for use by glibc
網(wǎng)址    :http://www.kernel.org/
協(xié)議    : GPLv2
描述    : Kernel-headers includes the C header files that specify the interface
         : between the Linux kernel and userspace libraries and programs.  The
         : header files define structures and constants that are needed for
         : building most standard programs and are also needed for rebuilding the
         : glibc package.

[root@i-hekarfs5 packages]# uname -r
3.10.0-1160.el7.x86_64

2.安裝 nvidia-docker-runtime

1.根據(jù)官方文檔,執(zhí)行安裝命令
Migration Notice | nvidia-container-runtime
centos7 的添加方式為:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.repo | \
sudo tee /etc/yum.repos.d/nvidia-container-runtime.repo

ubuntu的添加方式為:

curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
  sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
  1. 執(zhí)行安裝
yum install nvidia-container-runtime
  1. 運(yùn)行docker容器測試
docker run -it --rm --gpus all centos nvidia-smi

果然,不出意外的情況下,就要出意外了:

[root@centos79-temp install]# docker run -it --rm --gpus all centos nvidia-smi
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

原因是沒有重啟docker,執(zhí)行systemctl restart docker命令,再次測試就成功了

[root@centos79-temp install]# docker run -it --rm --gpus all centos nvidia-smi
Tue Mar  7 07:19:23 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.89.02    Driver Version: 525.89.02    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:00:0A.0 Off |                  N/A |
| 35%   30C    P0    N/A / 350W |      0MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容