為什么使用bpf分析耗時問題
網(wǎng)絡(luò)性能時分析抓包工具是一大利器,tcpdump/wireshark等抓包工具底層都是使用bpf提供的庫,抓包位置在軟中斷從網(wǎng)卡隊列(ring buffer)中讀取數(shù)據(jù)后發(fā)送給協(xié)議棧時同時發(fā)給抓包程序一份,也就是只能知道同一seq數(shù)據(jù)包在網(wǎng)卡接口處一來一回的時間,比較難分析的延時問題都是偶爾的一次抖動(一天一次,幾個小時一次這樣),所以抖動的原因是發(fā)生在內(nèi)核還是用戶太就說不清了(在tcp各監(jiān)控指標(biāo)都看不出異常的情況下)。使用bpf可通過kprobe或者tracepoint在協(xié)議棧各層的關(guān)鍵函數(shù)中添加hook點,當(dāng)數(shù)據(jù)包經(jīng)過該函數(shù)時,打印出seq、network namespace、時間戳等關(guān)鍵信息,幫助我們快速定位或者縮小問題范圍。
ebpf工作原理
-
這里簡單介紹下ebpf的原理,用戶寫好的ebpf程序,通過LLVM或者clang編譯好后,通過bpf校驗器校驗(如防止程序?qū)е聝?nèi)核崩潰),插入到內(nèi)核中的bpf虛擬機(jī)中,例如圖中的filter位置。通過網(wǎng)絡(luò)處理函的靜態(tài)探針(也就是tracepoint內(nèi)核代碼中預(yù)先定義好的hook點)或者動態(tài)探針(內(nèi)核函數(shù)中只有少數(shù)函數(shù)有tracepoint,沒有tracepoint的函數(shù)就需要靜態(tài)探針如:kprobe,kprobe通過注冊函數(shù)地址,然后在函數(shù)被調(diào)用時調(diào)用我們的bpf程序)調(diào)用bpf程序,過濾結(jié)果存到buffer,用戶層通過系統(tǒng)調(diào)用獲取過濾結(jié)果。這個圖來自tcpdump papers值給出了從網(wǎng)卡接口函數(shù)獲數(shù)據(jù)包,對于protocal stack或者內(nèi)核其他子系統(tǒng)的工作原理都類似。
來自tcpdump papers - 本文使用的bpftrace開發(fā)bpf程序,bpftrace(https://github.com/iovisor/bpftrace)實現(xiàn)了一套前端語言類似awk和c,不需要關(guān)心bpf程序的編譯過程。同時作者也將bcc的工具用bpftrace實現(xiàn)了一份,安裝即用。bpftrace開發(fā)的好處就是其語法簡單,缺點就是太簡單,不能自定義函數(shù),不能調(diào)用內(nèi)核函數(shù),所以只適合開發(fā)一些簡單的小工具。開發(fā)本文介紹的小工具時,已經(jīng)有些吃力。
場景
容器環(huán)境中,用戶反饋在流量沒有突增的情況下,請求redis p99時而發(fā)生抖動,頻率大概是一天一次。排除redis服務(wù)端,以及容器的cpu、內(nèi)存等問題。數(shù)據(jù)包在協(xié)議棧的耗時無法得知,所以決定完善tcp高級指標(biāo)的監(jiān)控。同時我們也需要一個工具,能夠知道從網(wǎng)卡隊列到tcp狀態(tài)機(jī)之前的耗時。
協(xié)議棧關(guān)鍵函數(shù)
- 發(fā)送:
ip_queue_xmit(tcp層處理后的段都要調(diào)用這個函數(shù)發(fā)送ip層的發(fā)送隊列)------> ip_finish_output2(進(jìn)入到鏈路層之前的處理函數(shù),也就是netfilter和路由之后) -----> __dev_queue_xmit(將數(shù)據(jù)發(fā)送到驅(qū)動層) - 接收:
netif_rx(軟中斷調(diào)用的處理函數(shù))-----> __netif_receive_skb(協(xié)議棧報文接收的入口)-----> ip_rcv(ip層入口)-----> tcp_v4_rcv(tcp層入口)-----> tcp_rcv_state_process(tcp狀態(tài)機(jī)) - 網(wǎng)橋(因為容器使用了docker):
br_handle_frame
大家可能看到發(fā)送過程沒有4層的關(guān)鍵函數(shù),因為調(diào)用ip_queue_xmit之前是沒有填充完整的tcp協(xié)議頭的,沒辦法在一個函數(shù)中獲取到我們需要的完整系信息。
-
下面這張圖描述了網(wǎng)絡(luò)報從宿主機(jī)到容器的一些關(guān)鍵函數(shù)
網(wǎng)絡(luò)報從宿主機(jī)到容器
使用bpftrace實現(xiàn)功能
- bpftrace的語法建議直接去github上看,文檔比較新。如果要在centos上使用bpftrace,建議使用ubuntu打鏡像直接使用ubuntu的安裝包,可以參考上篇文章http://www.itdecent.cn/p/54cb1f2b3614。
- 下邊直接給出__netif_receive_skb函數(shù)的kprobe打點代碼,其他函數(shù)的代碼基本一樣。上邊說了,由于bpftrace語法功能太簡單,因為不能定義函數(shù),所以代碼重復(fù)很多。另外提一下,bpftrace還是強類型的,所以過濾ip功能,要自己把需要過濾的ip轉(zhuǎn)化成網(wǎng)絡(luò)字節(jié)序。
- 一共使用了3個命令行參數(shù),參數(shù)一和二用于過濾IP,參數(shù)三是物理機(jī)network ns,這個ns的作用是在宿主機(jī)的二層數(shù)據(jù)處理函數(shù)__netif_receive_skb上打上時間戳,然后在容器的ns中的tcp_rcv_state_process函數(shù)處計算下時延。這樣方便快速的判斷是否有異常。我們程序中時延大于30ms就會打印一條日志。這是接收時,發(fā)送數(shù)據(jù)亦然。
kprobe:__netif_receive_skb
/
$1 == ((struct iphdr *)(((struct sk_buff *) arg0)->head + ((struct sk_buff *) arg0)->network_header))->saddr ||
$1 == ((struct iphdr *)(((struct sk_buff *) arg0)->head + ((struct sk_buff *) arg0)->network_header))->daddr ||
$2 == ((struct iphdr *)(((struct sk_buff *) arg0)->head + ((struct sk_buff *) arg0)->network_header))->saddr ||
$2 == ((struct iphdr *)(((struct sk_buff *) arg0)->head + ((struct sk_buff *) arg0)->network_header))->saddr
/
{
$skb = ((struct sk_buff *) arg0);
$net = $skb->dev->nd_net.net;
$netif = $skb->dev->name;
$nsid = $net->ns.inum;
$iphd = ((struct iphdr *)($skb->head + $skb->network_header));
$srcaddr = $iphd->saddr;
$dstaddr = $iphd->daddr;
$srcip = ntop($iphd->saddr);
$dstip = ntop($iphd->daddr);
if ($iphd->protocol == IPPROTO_TCP) {
$tcphd = ((struct tcphdr *)($skb->head + $skb->transport_header));
$sport = $tcphd->source;
$sport = ($sport >> 8) | (($sport << 8) & 0x00FF00);
$dport = $tcphd->dest;
$dport = ($dport >> 8) | (($dport << 8) & 0x00FF00);
// __u32 be32_to_cpu(const __be32);
$seq = $tcphd->seq;
$seq = ($seq >> 24) | (($seq & 0x00FF0000) >> 8) | (( $seq & 0x00FF00) << 8) | (($seq & 0xFF) << 24);
$ack = $tcphd->ack_seq;
$ack = ($ack >> 24) | (($ack & 0x00FF0000) >> 8) | (( $ack & 0x00FF00) << 8) | (($ack & 0xFF) << 24);
if($nsid == $3) {
@rcvpkg[$seq] = nsecs;
}
$win = $tcphd->window;
$win = ($win >> 8) | (($win << 8) & 0x00FF00);
$pkgflag = $tcphd->syn > 0 ? "syn" : ($tcphd->fin > 0 ? "fin" : ($tcphd->rst > 0 ? "rst" : ($tcphd->psh > 0 ? "psh" : ($tcphd->ack > 0 ? "ack" : ""))));
time("%H:%M:%S ");
printf("%-19u %-15s %d,%s,%s,%-20d ", $nsid, $netif, pid, comm, func, cpu);
printf("flags:%s, seq:%-u, ack:%u, win:%-25u ", $pkgflag, $seq, $ack, $win);
printf("%s:%-15d %s:%-15d %d ms\n", $srcip, $sport, $dstip, $dport, (nsecs / 1000000) % 1000);
}
}
測試驗證
- 用docker啟一個nginx容器
[root@ ~]# docker ps | grep 639267e5f60f
639267e5f60f nginx "/docker-entrypoint.…" 13 days ago
- 獲取docker pid
[root@ ~]# docker inspect --format {{.State.Pid}} 639267e5f60f
1946308
- 模擬容器網(wǎng)卡eth0發(fā)包延時100ms
nsenter -n -t 1946308 tc qdisc add dev eth0 root netem delay 100ms
- 模擬veth peer,就是宿主機(jī)上的veth,200ms的發(fā)包延時
tc qdisc add dev veth4458ea2 root netem delay 200ms
- 運行工具
root@:/# ./pkgtool.bt 1933764362 1966204682 4026531992
Attaching 11 probes...
Tracing tcp pkg. Hit Ctrl-C to end.
TIME NETWORK NS INTERFACE PID,COMM,FUNC,CPU PKGFLAGS SADDR:SPORT DADDR:DPORT TIMESTAMP
- 請求容器中的nginx,在除容器的宿主機(jī)之外的機(jī)器上請求。
[root@ ~]# curl 10.231.66.115(容器ip)
- 觀察輸出結(jié)果
12:11:24 4026531992 eth0 0,swapper/5,__netif_receive_skb,5 flags:syn, seq:3511028477, ack:0, win:0 10.231.66.115:49847 10.46.242.54:80 787 ms
12:11:24 4026531992 eth0 0,swapper/5,ip_rcv,5 flags:syn, seq:3511028477, ack:0, win:0 10.231.66.115:49847 10.46.242.54:80 787 ms
12:11:24 4026531992 docker0 0,swapper/5,ip_finish_output2,5 flags:syn, seq:3511028477, ack:0, win:0 10.231.66.115:49847 10.46.242.54:80 787 ms
12:11:24 4026531992 docker0 0,swapper/5,__dev_queue_xmit,5 flags:syn, seq:3511028477, ack:0, win:0 10.231.66.115:49847 10.46.242.54:80 787 ms
12:11:24 4026531992 veth4458ea2 0,swapper/5,__dev_queue_xmit,5 flags:syn, seq:3511028477, ack:0, win:0 10.231.66.115:49847 10.46.242.54:80 787 ms
12:11:24 4026538786 eth0 0,swapper/5,netif_rx,5 flags:syn, seq:3511028477, ack:0, win:0 10.231.66.115:49847 10.46.242.54:80 987 ms
12:11:24 4026538786 eth0 0,swapper/5,__netif_receive_skb,5 flags:syn, seq:3511028477, ack:0, win:0 10.231.66.115:49847 10.46.242.54:80 987 ms
12:11:24 4026538786 eth0 0,swapper/5,ip_rcv,5 flags:syn, seq:3511028477, ack:0, win:0 10.231.66.115:49847 10.46.242.54:80 988 ms
12:11:24 4026538786 eth0 0,swapper/5,tcp_v4_rcv,5 flags:syn, seq:3511028477, ack:0, win:0 10.231.66.115:49847 10.46.242.54:80 988 ms
WARN:RCV slow pkg: duration is 200 seq is 3511028477
12:11:24 4026538786 0,swapper/5,tcp_rcv_state_process,5 flags:syn, seq:3511028477, ack:0, win:0 10.231.66.115:49847 10.46.242.54:80 988 ms
12:11:24 4026538786 eth0 0,swapper/5,ip_finish_output2,5 flags:syn, seq:278643593, ack:3511028478, win:0 10.46.242.54:80 10.231.66.115:49847 988 ms
12:11:24 4026538786 eth0 0,swapper/5,__dev_queue_xmit,5 flags:syn, seq:278643593, ack:3511028478, win:0 10.46.242.54:80 10.231.66.115:49847 988 ms
12:11:24 4026531992 veth4458ea2 0,swapper/5,netif_rx,5 flags:syn, seq:278643593, ack:3511028478, win:0 10.46.242.54:80 10.231.66.115:49847 88 ms
12:11:24 4026531992 veth4458ea2 36,ksoftirqd/5,__netif_receive_skb,5 flags:syn, seq:278643593, ack:3511028478, win:0 10.46.242.54:80 10.231.66.115:49847 88 ms
12:11:24 4026531992 veth4458ea2 36,ksoftirqd/5,br_handle_frame_finish,5 flags:syn, seq:278643593, ack:3511028478, win:0 10.46.242.54:80 10.231.66.115:49847 88 ms
12:11:24 4026531992 docker0 36,ksoftirqd/5,__netif_receive_skb,5 flags:syn, seq:278643593, ack:3511028478, win:0 10.46.242.54:80 10.231.66.115:49847 88 ms
12:11:24 4026531992 docker0 36,ksoftirqd/5,ip_rcv,5 flags:syn, seq:278643593, ack:3511028478, win:0 10.46.242.54:80 10.231.66.115:49847 88 ms
12:11:24 4026531992 eth0 36,ksoftirqd/5,ip_finish_output2,5 flags:syn, seq:278643593, ack:3511028478, win:0 10.46.242.54:80 10.231.66.115:49847 88 ms
12:11:24 4026531992 eth0 36,ksoftirqd/5,__dev_queue_xmit,5 flags:syn, seq:278643593, ack:3511028478, win:0 10.46.242.54:80 10.231.66.115:49847 88 ms
12:11:24 4026531992 eth0 0,swapper/5,__netif_receive_skb,5 flags:ack, seq:3511028478, ack:278643594, win:0 10.231.66.115:49847 10.46.242.54:80 95 ms
12:11:24 4026531992 eth0 0,swapper/5,ip_rcv,5 flags:ack, seq:3511028478, ack:278643594, win:0 10.231.66.115:49847 10.46.242.54:80 95 ms
12:11:24 4026531992 docker0 0,swapper/5,ip_finish_output2,5 flags:ack, seq:3511028478, ack:278643594, win:0 10.231.66.115:49847 10.46.242.54:80 95 ms
12:11:24 4026531992 docker0 0,swapper/5,__dev_queue_xmit,5 flags:ack, seq:3511028478, ack:278643594, win:0 10.231.66.115:49847 10.46.242.54:80 95 ms
12:11:24 4026531992 veth4458ea2 0,swapper/5,__dev_queue_xmit,5 flags:ack, seq:3511028478, ack:278643594, win:0 10.231.66.115:49847 10.46.242.54:80 95 ms
12:11:24 4026531992 eth0 0,swapper/5,__netif_receive_skb,5 flags:psh, seq:3511028478, ack:278643594, win:0 10.231.66.115:49847 10.46.242.54:80 95 ms
12:11:24 4026531992 eth0 0,swapper/5,ip_rcv,5 flags:psh, seq:3511028478, ack:278643594, win:0 10.231.66.115:49847 10.46.242.54:80 95 ms
12:11:24 4026531992 docker0 0,swapper/5,ip_finish_output2,5 flags:psh, seq:3511028478, ack:278643594, win:0 10.231.66.115:49847 10.46.242.54:80 95 ms
12:11:24 4026531992 docker0 0,swapper/5,__dev_queue_xmit,5 flags:psh, seq:3511028478, ack:278643594, win:0 10.231.66.115:49847 10.46.242.54:80 95 ms
12:11:24 4026531992 veth4458ea2 0,swapper/5,__dev_queue_xmit,5 flags:psh, seq:3511028478, ack:278643594, win:0 10.231.66.115:49847 10.46.242.54:80 95 ms
12:11:24 4026538786 eth0 0,swapper/5,netif_rx,5 flags:ack, seq:3511028478, ack:278643594, win:0 10.231.66.115:49847 10.46.242.54:80 295 ms
12:11:24 4026538786 eth0 36,ksoftirqd/5,__netif_receive_skb,5 flags:ack, seq:3511028478, ack:278643594, win:0 10.231.66.115:49847 10.46.242.54:80 295 ms
12:11:24 4026538786 eth0 36,ksoftirqd/5,ip_rcv,5 flags:ack, seq:3511028478, ack:278643594, win:0 10.231.66.115:49847 10.46.242.54:80 295 ms
12:11:24 4026538786 eth0 36,ksoftirqd/5,tcp_v4_rcv,5 flags:ack, seq:3511028478, ack:278643594, win:0 10.231.66.115:49847 10.46.242.54:80 295 ms
WARN:RCV slow pkg: duration is 200 seq is 3511028478
12:11:24 4026538786 eth0 36,ksoftirqd/5,tcp_rcv_state_process,5 flags:ack, seq:3511028478, ack:278643594, win:0 10.231.66.115:49847 10.46.242.54:80 295 ms
12:11:24 4026538786 eth0 36,ksoftirqd/5,netif_rx,5 flags:psh, seq:3511028478, ack:278643594, win:0 10.231.66.115:49847 10.46.242.54:80 295 ms
12:11:24 4026538786 eth0 36,ksoftirqd/5,__netif_receive_skb,5 flags:psh, seq:3511028478, ack:278643594, win:0 10.231.66.115:49847 10.46.242.54:80 295 ms
12:11:24 4026538786 eth0 36,ksoftirqd/5,ip_rcv,5 flags:psh, seq:3511028478, ack:278643594, win:0 10.231.66.115:49847 10.46.242.54:80 295 ms
12:11:24 4026538786 eth0 36,ksoftirqd/5,tcp_v4_rcv,5 flags:psh, seq:3511028478, ack:278643594, win:0 10.231.66.115:49847 10.46.242.54:80 295 ms
12:11:24 4026538786 36,ksoftirqd/5,__ip_queue_xmit,5 flags:ack, seq:278643594, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 295 ms
12:11:24 4026538786 eth0 36,ksoftirqd/5,ip_finish_output2,5 flags:ack, seq:278643594, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 295 ms
12:11:24 4026538786 eth0 36,ksoftirqd/5,__dev_queue_xmit,5 flags:ack, seq:278643594, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 295 ms
12:11:24 4026538786 1946355,nginx,__ip_queue_xmit,10 flags:psh, seq:278643594, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 296 ms
12:11:24 4026538786 eth0 1946355,nginx,ip_finish_output2,10 flags:psh, seq:278643594, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 296 ms
12:11:24 4026538786 eth0 1946355,nginx,__dev_queue_xmit,10 flags:psh, seq:278643594, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 296 ms
12:11:24 4026538786 1946355,nginx,__ip_queue_xmit,10 flags:psh, seq:278643832, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 296 ms
12:11:24 4026538786 eth0 1946355,nginx,ip_finish_output2,10 flags:psh, seq:278643832, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 296 ms
12:11:24 4026538786 eth0 1946355,nginx,__dev_queue_xmit,10 flags:psh, seq:278643832, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 296 ms
12:11:25 4026531992 veth4458ea2 0,swapper/5,netif_rx,5 flags:ack, seq:278643594, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 395 ms
12:11:25 4026531992 veth4458ea2 0,swapper/5,__netif_receive_skb,5 flags:ack, seq:278643594, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 396 ms
12:11:25 4026531992 veth4458ea2 0,swapper/5,br_handle_frame_finish,5 flags:ack, seq:278643594, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 396 ms
12:11:25 4026531992 docker0 0,swapper/5,__netif_receive_skb,5 flags:ack, seq:278643594, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 396 ms
12:11:25 4026531992 docker0 0,swapper/5,ip_rcv,5 flags:ack, seq:278643594, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 396 ms
12:11:25 4026531992 eth0 0,swapper/5,ip_finish_output2,5 flags:ack, seq:278643594, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 396 ms
WARN: SEND slow pkg: duration is 99 seq is 278643594
12:11:25 4026531992 eth0 0,swapper/5,__dev_queue_xmit,5 flags:ack, seq:278643594, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 396 ms
12:11:25 4026531992 veth4458ea2 36,ksoftirqd/5,netif_rx,5 flags:psh, seq:278643594, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 396 ms
12:11:25 4026531992 veth4458ea2 36,ksoftirqd/5,__netif_receive_skb,5 flags:psh, seq:278643594, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 396 ms
12:11:25 4026531992 veth4458ea2 36,ksoftirqd/5,br_handle_frame_finish,5 flags:psh, seq:278643594, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 396 ms
12:11:25 4026531992 docker0 36,ksoftirqd/5,__netif_receive_skb,5 flags:psh, seq:278643594, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 396 ms
12:11:25 4026531992 docker0 36,ksoftirqd/5,ip_rcv,5 flags:psh, seq:278643594, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 396 ms
12:11:25 4026531992 eth0 36,ksoftirqd/5,ip_finish_output2,5 flags:psh, seq:278643594, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 396 ms
12:11:25 4026531992 eth0 36,ksoftirqd/5,__dev_queue_xmit,5 flags:psh, seq:278643594, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 396 ms
12:11:25 4026531992 veth4458ea2 36,ksoftirqd/5,netif_rx,5 flags:psh, seq:278643832, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 396 ms
12:11:25 4026531992 veth4458ea2 36,ksoftirqd/5,__netif_receive_skb,5 flags:psh, seq:278643832, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 396 ms
12:11:25 4026531992 veth4458ea2 36,ksoftirqd/5,br_handle_frame_finish,5 flags:psh, seq:278643832, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 396 ms
12:11:25 4026531992 docker0 36,ksoftirqd/5,__netif_receive_skb,5 flags:psh, seq:278643832, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 396 ms
12:11:25 4026531992 docker0 36,ksoftirqd/5,ip_rcv,5 flags:psh, seq:278643832, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 396 ms
12:11:25 4026531992 eth0 36,ksoftirqd/5,ip_finish_output2,5 flags:psh, seq:278643832, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 396 ms
WARN: SEND slow pkg: duration is 100 seq is 278643832
12:11:25 4026531992 eth0 36,ksoftirqd/5,__dev_queue_xmit,5 flags:psh, seq:278643832, ack:3511028554, win:0 10.46.242.54:80 10.231.66.115:49847 396 ms
12:11:25 4026531992 eth0 0,swapper/5,__netif_receive_skb,5 flags:fin, seq:3511028554, ack:278644444, win:0 10.231.66.115:49847 10.46.242.54:80 408 ms
12:11:25 4026531992 eth0 0,swapper/5,ip_rcv,5 flags:fin, seq:3511028554, ack:278644444, win:0 10.231.66.115:49847 10.46.242.54:80 408 ms
12:11:25 4026531992 docker0 0,swapper/5,ip_finish_output2,5 flags:fin, seq:3511028554, ack:278644444, win:0 10.231.66.115:49847 10.46.242.54:80 408 ms
12:11:25 4026531992 docker0 0,swapper/5,__dev_queue_xmit,5 flags:fin, seq:3511028554, ack:278644444, win:0 10.231.66.115:49847 10.46.242.54:80 408 ms
12:11:25 4026531992 veth4458ea2 0,swapper/5,__dev_queue_xmit,5 flags:fin, seq:3511028554, ack:278644444, win:0 10.231.66.115:49847 10.46.242.54:80 408 ms
12:11:25 4026538786 eth0 0,swapper/5,netif_rx,5 flags:fin, seq:3511028554, ack:278644444, win:0 10.231.66.115:49847 10.46.242.54:80 608 ms
12:11:25 4026538786 eth0 0,swapper/5,__netif_receive_skb,5 flags:fin, seq:3511028554, ack:278644444, win:0 10.231.66.115:49847 10.46.242.54:80 608 ms
12:11:25 4026538786 eth0 0,swapper/5,ip_rcv,5 flags:fin, seq:3511028554, ack:278644444, win:0 10.231.66.115:49847 10.46.242.54:80 608 ms
12:11:25 4026538786 eth0 0,swapper/5,tcp_v4_rcv,5 flags:fin, seq:3511028554, ack:278644444, win:0 10.231.66.115:49847 10.46.242.54:80 608 ms
12:11:25 4026538786 1946355,nginx,__ip_queue_xmit,10 flags:fin, seq:278644444, ack:3511028555, win:0 10.46.242.54:80 10.231.66.115:49847 608 ms
12:11:25 4026538786 eth0 1946355,nginx,ip_finish_output2,10 flags:fin, seq:278644444, ack:3511028555, win:0 10.46.242.54:80 10.231.66.115:49847 608 ms
12:11:25 4026538786 eth0 1946355,nginx,__dev_queue_xmit,10 flags:fin, seq:278644444, ack:3511028555, win:0 10.46.242.54:80 10.231.66.115:49847 608 ms
12:11:25 4026531992 veth4458ea2 0,swapper/10,netif_rx,10 flags:fin, seq:278644444, ack:3511028555, win:0 10.46.242.54:80 10.231.66.115:49847 708 ms
12:11:25 4026531992 veth4458ea2 0,swapper/10,__netif_receive_skb,10 flags:fin, seq:278644444, ack:3511028555, win:0 10.46.242.54:80 10.231.66.115:49847 708 ms
12:11:25 4026531992 veth4458ea2 0,swapper/10,br_handle_frame_finish,10 flags:fin, seq:278644444, ack:3511028555, win:0 10.46.242.54:80 10.231.66.115:49847 708 ms
12:11:25 4026531992 docker0 0,swapper/10,__netif_receive_skb,10 flags:fin, seq:278644444, ack:3511028555, win:0 10.46.242.54:80 10.231.66.115:49847 708 ms
12:11:25 4026531992 docker0 0,swapper/10,ip_rcv,10 flags:fin, seq:278644444, ack:3511028555, win:0 10.46.242.54:80 10.231.66.115:49847 708 ms
12:11:25 4026531992 eth0 0,swapper/10,ip_finish_output2,10 flags:fin, seq:278644444, ack:3511028555, win:0 10.46.242.54:80 10.231.66.115:49847 708 ms
WARN: SEND slow pkg: duration is 100 seq is 278644444
12:11:25 4026531992 eth0 0,swapper/10,__dev_queue_xmit,10 flags:fin, seq:278644444, ack:3511028555, win:0 10.46.242.54:80 10.231.66.115:49847 708 ms
12:11:25 4026531992 eth0 0,swapper/5,__netif_receive_skb,5 flags:ack, seq:3511028555, ack:278644445, win:0 10.231.66.115:49847 10.46.242.54:80 720 ms
12:11:25 4026531992 eth0 0,swapper/5,ip_rcv,5 flags:ack, seq:3511028555, ack:278644445, win:0 10.231.66.115:49847 10.46.242.54:80 720 ms
12:11:25 4026531992 docker0 0,swapper/5,ip_finish_output2,5 flags:ack, seq:3511028555, ack:278644445, win:0 10.231.66.115:49847 10.46.242.54:80 720 ms
12:11:25 4026531992 docker0 0,swapper/5,__dev_queue_xmit,5 flags:ack, seq:3511028555, ack:278644445, win:0 10.231.66.115:49847 10.46.242.54:80 720 ms
12:11:25 4026531992 veth4458ea2 0,swapper/5,__dev_queue_xmit,5 flags:ack, seq:3511028555, ack:278644445, win:0 10.231.66.115:49847 10.46.242.54:80 720 ms
12:11:25 4026538786 eth0 0,swapper/5,netif_rx,5 flags:ack, seq:3511028555, ack:278644445, win:0 10.231.66.115:49847 10.46.242.54:80 920 ms
12:11:25 4026538786 eth0 0,swapper/5,__netif_receive_skb,5 flags:ack, seq:3511028555, ack:278644445, win:0 10.231.66.115:49847 10.46.242.54:80 921 ms
12:11:25 4026538786 eth0 0,swapper/5,ip_rcv,5 flags:ack, seq:3511028555, ack:278644445, win:0 10.231.66.115:49847 10.46.242.54:80 921 ms
12:11:25 4026538786 eth0 0,swapper/5,tcp_v4_rcv,5 flags:ack, seq:3511028555, ack:278644445, win:0 10.231.66.115:49847 10.46.242.54:80 921 ms
WARN:RCV slow pkg: duration is 200 seq is 3511028555
12:11:25 4026538786 0,swapper/5,tcp_rcv_state_process,5 flags:ack, seq:3511028555, ack:278644445, win:0 10.231.66.115:49847 10.46.242.54:80 921 ms
- 分析結(jié)果,發(fā)送包的100ms和接收包的200ms都已經(jīng)打印出來了。拿“WARN: SEND slow pkg: duration is 100 seq is 278644444”這條日志看看,序列號是“278644444”的包從容器到宿主機(jī)上延時超過了100ms(正常的情況下不會超過5ms,如果超過5ms物理機(jī)壓力可能比較大,長時間這樣的話需要額外關(guān)注了)。拿著序列號“seq:278644444”,可以過濾出下面的信息:
12:11:25 4026538786 1946355,nginx,__ip_queue_xmit,10 flags:fin, seq:278644444, ack:3511028555, win:0 10.46.242.54:80 10.231.66.115:49847 608 ms
12:11:25 4026538786 eth0 1946355,nginx,ip_finish_output2,10 flags:fin, seq:278644444, ack:3511028555, win:0 10.46.242.54:80 10.231.66.115:49847 608 ms
12:11:25 4026538786 eth0 1946355,nginx,__dev_queue_xmit,10 flags:fin, seq:278644444, ack:3511028555, win:0 10.46.242.54:80 10.231.66.115:49847 608 ms
12:11:25 4026531992 veth4458ea2 0,swapper/10,netif_rx,10 flags:fin, seq:278644444, ack:3511028555, win:0 10.46.242.54:80 10.231.66.115:49847 708 ms
12:11:25 4026531992 veth4458ea2 0,swapper/10,__netif_receive_skb,10 flags:fin, seq:278644444, ack:3511028555, win:0 10.46.242.54:80 10.231.66.115:49847 708 ms
12:11:25 4026531992 veth4458ea2 0,swapper/10,br_handle_frame_finish,10 flags:fin, seq:278644444, ack:3511028555, win:0 10.46.242.54:80 10.231.66.115:49847 708 ms
12:11:25 4026531992 docker0 0,swapper/10,__netif_receive_skb,10 flags:fin, seq:278644444, ack:3511028555, win:0 10.46.242.54:80 10.231.66.115:49847 708 ms
12:11:25 4026531992 docker0 0,swapper/10,ip_rcv,10 flags:fin, seq:278644444, ack:3511028555, win:0 10.46.242.54:80 10.231.66.115:49847 708 ms
12:11:25 4026531992 eth0 0,swapper/10,ip_finish_output2,10 flags:fin, seq:278644444, ack:3511028555, win:0 10.46.242.54:80 10.231.66.115:49847 708 ms
WARN: SEND slow pkg: duration is 100 seq is 278644444
12:11:25 4026531992 eth0 0,swapper/10,__dev_queue_xmit,10 flags:fin, seq:278644444, ack:3511028555, win:0 10.46.242.54:80 10.231.66.115:49847 708 ms
可以看到容器網(wǎng)卡發(fā)包后(__dev_queue_xmit)到veth收包(netif_rx),時間戳由608 ms 到了708 ms(為了直觀,我們把時間戳由ns轉(zhuǎn)化為了ms)。這樣很快就能把問題縮小到網(wǎng)卡接口層。
- 上邊的測試只是為了說明工具,定位問題便捷。我們目前還沒遇見在這一鏈路發(fā)生延時的問題,即使這樣我們能快速的排除這段鏈路的問題,縮小問題范圍。
最后
- bpftrace作者提供了很多實用工具,也可以根據(jù)其提供的編程語言實現(xiàn)自己的工具。在傳統(tǒng)工具得不到什么信息時,bpftrace可能為你開荒破土。
- 由于bpftrace語法功能的原因,這個工具還是打印太多的數(shù)據(jù),如果有需要像這樣稍微復(fù)雜一點的工具,應(yīng)該直接用c寫,只打印出時延超過預(yù)期的數(shù)據(jù)。
- 網(wǎng)絡(luò)性能分析時,一直想要確定物理機(jī)二層到容器tcp層之間是否有問題,也一直想有個這樣的工具。得益于“參考”的第一個連接ebay的一邊性能分析文章,否則不會這么順暢。
參考
https://mp.weixin.qq.com/s/ZUS94PMCKsqgZFHX9b99-g
https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md
https://www.tcpdump.org/papers/bpf-usenix93.pdf

