使用MLC(Intel Memory Latency Checker )測(cè)試Numa對(duì)內(nèi)存延遲的影響

Numa配置對(duì)數(shù)據(jù)庫(kù)環(huán)境有swap的影響已經(jīng)眾所周知,但Numa對(duì)內(nèi)存延遲、磁盤吞吐量都有影響,最近看到MLC(Intel Memory Latency Checker )這個(gè)工具,測(cè)試了下跨numa的內(nèi)存延遲有多大

# 準(zhǔn)備工作
# cat /sys/kernel/mm/transparent_hugepage/defrag
always defer defer+madvise [madvise] never
 
# cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never

# cat /proc/sys/vm/nr_hugepages
0

echo 4000 > /proc/sys/vm/nr_hugepages


# 內(nèi)存延遲測(cè)試
# ./mlc --latency_matrix
Intel(R) Memory Latency Checker - v3.11b
Command line parameters: --latency_matrix

Using buffer size of 1800.000MiB
Measuring idle latencies for random access (in ns)...
        Numa node
Numa node        0       1
       0     104.7   287.5
       1     292.6   104.6

# 內(nèi)存帶寬測(cè)試       
[root@k8s-10-128-64-5 Linux]# ./mlc --bandwidth_matrix
Intel(R) Memory Latency Checker - v3.11b
Command line parameters: --bandwidth_matrix

Using buffer size of 100.000MiB/thread for reads and an additional 100.000MiB/thread for writes
Measuring Memory Bandwidths between nodes within system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
        Numa node
Numa node        0       1
       0    143101.8    78546.8
       1    78872.6 143302.3

可以看到跨Numa是有2~3倍的延遲以及帶寬影響的,所以假如追求極致性能,建議--cpunodebind=0 --membind=0--cpunodebind=1 --membind=1來(lái)綁定到同個(gè)Numa節(jié)點(diǎn)。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

友情鏈接更多精彩內(nèi)容