Linux服務(wù)器硬件檢測(cè)流程

使用sysbench進(jìn)行性能測(cè)試,安裝apt install -y sysbench,教程

目錄

  • 操作系統(tǒng)
  • cpu
    • 查看主頻,核數(shù)和線程數(shù)
    • 性能測(cè)試前的準(zhǔn)備
    • 性能測(cè)試
  • GPU
    • 查看大小,型號(hào),驅(qū)動(dòng)是否安裝正確
    • 性能測(cè)試
  • 內(nèi)存
    • 查看大小
    • 吞吐量
  • 磁盤(pán)
    • 查看大小
    • IO性能
  • 交換機(jī)-集群
    • 所以節(jié)點(diǎn)之間的連通性
    • 網(wǎng)速

操作系統(tǒng)

$ lsb_release -a

CPU

查看主頻,核數(shù),線程數(shù)

Socket芯片卡槽數(shù),Core(s) per socket每一塊芯片有多少核心,Thread(s) per core每個(gè)核心支持幾個(gè)線程,即是否使用超線程技術(shù)

CPU數(shù)則為Socket * Core(s) per socket * Thread(s) per core

$ lscpu
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          46 bits physical, 57 bits virtual
  Byte Order:             Little Endian
CPU(s):                   112
  On-line CPU(s) list:    0-111
Vendor ID:                GenuineIntel
  Model name:             Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
    CPU family:           6
    Model:                106
    Thread(s) per core:   2
    Core(s) per socket:   28
    Socket(s):            2
    Stepping:             6
    CPU max MHz:          3100.0000
    CPU min MHz:          800.0000
    BogoMIPS:             4000.00

性能測(cè)試前的準(zhǔn)備

/sys/devices/system/cpu/cpu*/cpufreq文件夾里有每個(gè)CPU的配置和信息,*代表CPU編號(hào)(0~N-1)??捎?code>cat查看每個(gè)文件

/sys/devices/system/cpu/cpu*/cpufreq/scaling_governor中為CPU頻率調(diào)節(jié)器的類(lèi)型,可用如下命令改變模式

echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

有如下幾種選擇:

  • performance:將CPU頻率設(shè)置為最高值,以提供最佳性能。適合需要高響應(yīng)速度和處理能力的場(chǎng)景,但可能會(huì)增加功耗和熱量。
  • powersave:將CPU頻率設(shè)置為最低值,以節(jié)省電力。適合電池供電設(shè)備或?qū)拿舾械膱?chǎng)景。
  • userspace:允許用戶(hù)空間程序通過(guò)寫(xiě)入scaling_setspeed屬性來(lái)設(shè)置CPU頻率。
  • ondemand:根據(jù)當(dāng)前系統(tǒng)負(fù)載動(dòng)態(tài)調(diào)整CPU頻率。當(dāng)負(fù)載增加時(shí),頻率會(huì)提高以提供更好的性能,而在輕負(fù)載時(shí)頻率會(huì)降低以節(jié)省電力。
  • conservative:類(lèi)似于ondemand,但頻率調(diào)整更加平緩,不會(huì)立即跳到最高頻率。適合需要平衡性能和功耗的場(chǎng)景。
  • schedutil:基于CPU調(diào)度器的利用率數(shù)據(jù)來(lái)動(dòng)態(tài)調(diào)整頻率。它是較新的調(diào)節(jié)器,通常被認(rèn)為是ondemand和conservative的替代品,因?yàn)樗cCPU調(diào)度器更緊密集成,開(kāi)銷(xiāo)更小。

/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq中為CPU頻率變化閾值上限

/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq中為CPU頻率變化閾值下限

/sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq中為當(dāng)前CPU頻率

另外可以開(kāi)一個(gè)窗口持續(xù)執(zhí)行watch -n 1 "cat /proc/cpuinfo | grep 'MHz'"來(lái)監(jiān)控當(dāng)前CPU頻率

/sys/devices/system/cpu/cpu*/cpufreq/base_frequency中為一個(gè)頻率基準(zhǔn)值

當(dāng)scaling_governorperformance時(shí),若有base_frequency,則CPU頻率不會(huì)升高到scaling_max_freq而是會(huì)維持在base_frequency,同理,當(dāng)scaling_governorpowersave時(shí),若有base_frequency,則CPU頻率不會(huì)下降到scaling_min_freq而是會(huì)維持在base_frequency

/sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_max_freq中為CPU信息中的頻率上限,對(duì)應(yīng)lscpu中的CPU max MHz

/sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_min_freq中為CPU信息中的頻率下限,對(duì)應(yīng)lscpu中的CPU min MHz

性能測(cè)試

單核性能測(cè)試

$ sysbench --test=cpu --cpu-max-prime=20000 --time=30 run

多核性能測(cè)試

$ sysbench --test=cpu --cpu-max-prime=20000 --threads=112 --time=30 run

結(jié)果會(huì)包含每秒任務(wù)數(shù),任務(wù)耗時(shí),線程均衡性

CPU speed:
    events per second: 50015.22

General statistics:
    total time:                          30.0023s
    total number of events:              1500650

Latency (ms):
         min:                                    0.98
         avg:                                    2.24
         max:                                   26.24
         95th percentile:                        2.26
         sum:                              3358727.27

Threads fairness:
    events (avg/stddev):           13398.6607/139.68
    execution time (avg/stddev):   29.9886/0.01

GPU

查看大小,型號(hào),驅(qū)動(dòng)是否安裝正確

Nvidia的顯卡可以如下查看,Perf下的部分就是型號(hào),Memory-Usage下的部分就是顯卡內(nèi)存

$ nvidia-smi
Tue Feb 18 06:41:55 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.12              Driver Version: 550.90.12      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100 80GB PCIe          Off |   00000000:52:00.0 Off |                    0 |
| N/A   44C    P0             62W /  300W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100 80GB PCIe          Off |   00000000:56:00.0 Off |                    0 |
| N/A   45C    P0             68W /  300W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA A100 80GB PCIe          Off |   00000000:D1:00.0 Off |                    0 |
| N/A   41C    P0             66W /  300W |       1MiB /  81920MiB |      2%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA A100 80GB PCIe          Off |   00000000:D5:00.0 Off |                    0 |
| N/A   42C    P0             65W /  300W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

單卡/多卡性能

使用gpu_burn進(jìn)行測(cè)試,官方倉(cāng)庫(kù)

但官方給的加-參數(shù)似乎都不好使

$ cd /home/hx/gpu_burn
$ ./gpu_burn 100
GPU 0: NVIDIA A100 80GB PCIe (UUID: GPU-751c75f1-b612-d705-c571-9173e4969f8b)
GPU 1: NVIDIA A100 80GB PCIe (UUID: GPU-569f62c0-3b97-4d25-a7fc-70b2a2724478)
GPU 2: NVIDIA A100 80GB PCIe (UUID: GPU-2be55775-2295-c096-411d-4f28a4b50ec4)
GPU 3: NVIDIA A100 80GB PCIe (UUID: GPU-4f6920dc-f153-925e-a803-181dd91a232f)
Initialized device 0 with 81037 MB of memory (80602 MB available, using 72542 MB of it), using FLOATS
Initialized device 3 with 81037 MB of memory (80602 MB available, using 72542 MB of it), using FLOATS
Initialized device 2 with 81037 MB of memory (80602 MB available, using 72542 MB of it), using FLOATS
Initialized device 1 with 81037 MB of memory (80602 MB available, using 72542 MB of it), using FLOATS
11.0%  proc'd: 9062 (17018 Gflop/s) - 9062 (16879 Gflop/s) - 9062 (17042 Gflop/s) - 4531 (14077 Gflop/s)   errors: 0 - 0 - 0 - 0   temps: 72 C - 73 C - 68 C - 81 C
        Summary at:   Wed Feb 19 04:23:49 AM UTC 2025

24.0%  proc'd: 22655 (16849 Gflop/s) - 18124 (16789 Gflop/s) - 18124 (16967 Gflop/s) - 18124 (15897 Gflop/s)   errors: 0 - 0 - 0 - 0   temps: 80 C - 79 C - 75 C - 85 C
        Summary at:   Wed Feb 19 04:24:02 AM UTC 2025

36.0%  proc'd: 31717 (16763 Gflop/s) - 31717 (16672 Gflop/s) - 31717 (16856 Gflop/s) - 27186 (14160 Gflop/s)   errors: 0 - 0 - 0 - 0   temps: 83 C - 82 C - 78 C - 84 C
        Summary at:   Wed Feb 19 04:24:14 AM UTC 2025

47.0%  proc'd: 40779 (15897 Gflop/s) - 40779 (16435 Gflop/s) - 45310 (16754 Gflop/s) - 36248 (12967 Gflop/s)   errors: 0 - 0 - 0 - 0   temps: 85 C - 84 C - 82 C - 85 C
        Summary at:   Wed Feb 19 04:24:25 AM UTC 2025

58.0%  proc'd: 49841 (14627 Gflop/s) - 54372 (14800 Gflop/s) - 54372 (16656 Gflop/s) - 45310 (11643 Gflop/s)   errors: 0 - 0 - 0 - 0   temps: 85 C - 85 C - 84 C - 84 C
        Summary at:   Wed Feb 19 04:24:36 AM UTC 2025

69.0%  proc'd: 58903 (13925 Gflop/s) - 63434 (14173 Gflop/s) - 63434 (15784 Gflop/s) - 49841 (10948 Gflop/s)   errors: 0 - 0 - 0 - 0   temps: 84 C - 84 C - 84 C - 84 C
        Summary at:   Wed Feb 19 04:24:47 AM UTC 2025

80.0%  proc'd: 67965 (13543 Gflop/s) - 72496 (13905 Gflop/s) - 72496 (15110 Gflop/s) - 58903 (10935 Gflop/s)   errors: 0 - 0 - 0 - 0   temps: 84 C - 84 C - 84 C - 84 C
        Summary at:   Wed Feb 19 04:24:58 AM UTC 2025

91.0%  proc'd: 77027 (13270 Gflop/s) - 77027 (13868 Gflop/s) - 81558 (14820 Gflop/s) - 63434 (10594 Gflop/s)   errors: 0 - 0 - 0 - 0   temps: 84 C - 84 C - 84 C - 84 C
        Summary at:   Wed Feb 19 04:25:09 AM UTC 2025

100.0%  proc'd: 86089 (13270 Gflop/s) - 86089 (13676 Gflop/s) - 90620 (14577 Gflop/s) - 72496 (10184 Gflop/s)   errors: 0 - 0 - 0 - 0   temps: 84 C - 84 C - 84 C - 84 C
Killing processes.. done

Tested 4 GPUs:
        GPU 0: OK
        GPU 1: OK
        GPU 2: OK
        GPU 3: OK

內(nèi)存

查看大小

$ sudo lshw -C memory
  *-memory
       description: System Memory
       physical id: 4f
       slot: System board or motherboard
       size: 256GiB
$ free -h
               total        used        free      shared  buff/cache   available
Mem:           251Gi       1.0Gi       247Gi       2.0Mi       2.9Gi       249Gi
Swap:          8.0Gi          0B       8.0Gi

吞吐量

多線程隨機(jī)寫(xiě)入效率

$ sysbench memory --memory-block-size=1M --memory-total-size=200G --threads=50 --memory-access-mode=rnd run

多線程隨機(jī)讀取效率

$ sysbench memory --memory-block-size=1M --memory-total-size=200G --memory-access-mode=rnd --threads=50 --memory-oper=read run

磁盤(pán)

查看大小,分區(qū)合理性

lsblk,fdiskdf -h等命令查看到的1GB=1024MB換算來(lái)的容量,而硬盤(pán)廠商一般用1GB=1000MB換算,因此容量看上去會(huì)比預(yù)期的少許多,只有用parted能看到符合容量標(biāo)注的大小

ROTA值為1為HDD,0為SSD)

$ lsblk --output NAME,ROTA,SIZE,TYPE,RM,RO,MOUNTPOINTS
nvme0n1                      0  3.5T disk  0  0
├─nvme0n1p1                  0    1G part  0  0 /boot/efi
├─nvme0n1p2                  0    2G part  0  0 /boot
└─nvme0n1p3                  0  3.5T part  0  0
  └─ubuntu--vg-ubuntu--lv    0  100G lvm   0  0 /
$ sudo parted /dev/nvme0n1 print
[sudo] password for hx:
Model: SAMSUNG MZQL23T8HCLS-00A07 (nvme)
Disk /dev/nvme0n1: 3841GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  1128MB  1127MB  fat32              boot, esp
 2      1128MB  3276MB  2147MB  ext4
 3      3276MB  3841GB  3837GB

IO性能

sysbench評(píng)估磁盤(pán)讀寫(xiě)需要先prepare準(zhǔn)備數(shù)據(jù),然后run,測(cè)試完后cleanup清理測(cè)試數(shù)據(jù)

多線程隨機(jī)寫(xiě)人測(cè)試

$ sysbench fileio --file-total-size=25G --file-test-mode=rndwr --threads=10 --file-num=100 prepare
$ sysbench fileio --file-total-size=25G --file-test-mode=rndwr --threads=10 --file-num=100 --report-interval=1 run
$ sysbench fileio --file-total-size=25G --file-test-mode=rndwr --threads=10 --file-num=100 cleanup

多線程隨機(jī)讀取測(cè)試

$ sysbench fileio --file-total-size=25G --file-test-mode=rndrd --threads=10 --file-num=100 prepare
$ sysbench fileio --file-total-size=25G --file-test-mode=rndrd --threads=10 --file-num=100 --report-interval=1 run
$ sysbench fileio --file-total-size=25G --file-test-mode=rndrd --threads=10 --file-num=100 cleanup

多線程隨機(jī)讀寫(xiě)混合,讀寫(xiě)比6:4

$ sysbench fileio --file-total-size=25G --file-test-mode=rndrw --threads=10 --file-num=100 --file-rw-ratio=1.5 prepare
$ sysbench fileio --file-total-size=25G --file-test-mode=rndrw --threads=10 --file-num=100 --file-rw-ratio=1.5 --report-interval=1 run
$ sysbench fileio --file-total-size=25G --file-test-mode=rndrw --threads=10 --file-num=100 --file-rw-ratio=1.5 cleanup

交換機(jī)-集群

網(wǎng)絡(luò)連通性

網(wǎng)速

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容