查找線程死鎖或掛起的原因

分享一個(gè)之前整理的查找線程死鎖或掛起的原因;
注:服務(wù)器環(huán)境 linux ,用于C/C++編寫(xiě)的進(jìn)程,JAVA原理類(lèi)似。

常見(jiàn)由線程掛起導(dǎo)致的現(xiàn)象
程序處理速度由慢到嚴(yán)重超時(shí),最后全部超時(shí),重啟程序會(huì)循環(huán)這一現(xiàn)象,那90%是線程被掛起了。
常見(jiàn)的線程掛起或死鎖有
線程鎖里面出現(xiàn)死循環(huán),鎖不能被釋放,導(dǎo)致其它線程一直等待;
鎖里加鎖,即雙重鎖;
多線程編程里,共享資源沒(méi)有加線程鎖,造成多線程共同強(qiáng)奪資源而掛起。

判斷進(jìn)程是否掛起
使用pstree命令查看某進(jìn)程的線程數(shù):pstree -p |grep [進(jìn)程名]。
例如下:

yuejctest:[/yuejc]pstree -p |grep Ywdeal
        |-Ywdeal(8969)-+-{Ywdeal}(9013) [線程1]
        |             `-{Ywdeal}(9016) [線程2]

如果在次執(zhí)行此函數(shù),發(fā)現(xiàn)線程數(shù)一直在增加(程序中有限制,達(dá)到限制時(shí)不在增加也不減)說(shuō)明線程無(wú)法釋放,可能被掛起。
什么是pstack
此命令可顯示每個(gè)進(jìn)程的棧跟蹤,使用 pstack 來(lái)確定進(jìn)程掛起的位置。此命令的唯一選項(xiàng)是‘要檢查進(jìn)程的 PID’。
pstack pid,你會(huì)得到很多信息:
例如下:

yuejctest:[/yuejc]pstack 8969
Thread 3 (Thread 0x42e88940 (LWP 9013)):
#0  0x00000039e329a0b1 in nanosleep () from /lib64/libc.so.6
#1  0x00000039e3299f99 in sleep () from /lib64/libc.so.6
#2  0x0000000000406dc9 in pthread_mdb_keepconnect ()
#3  0x00000039e3e064a7 in start_thread () from /lib64/libpthread.so.0
#4  0x00000039e32d3c2d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x44e89940 (LWP 9016)):
#0  0x00000039e329a0b1 in nanosleep () from /lib64/libc.so.6
#1  0x00000039e3299f99 in sleep () from /lib64/libc.so.6
#2  0x0000000000406b49 in pthread_db_keepconnect ()
#3  0x00000039e3e064a7 in start_thread () from /lib64/libpthread.so.0
#4  0x00000039e32d3c2d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2b2518adcdd0 (LWP 8969)):
#0  0x00000039e32d4f52 in msgrcv () from /lib64/libc.so.6
#1  0x000000000045eab4 in msgRcv ()
#2  0x00000000004056d0 in main ()

現(xiàn)實(shí)中遇到的問(wèn)題,幫你如何從pstack信息中找到掛起原因:

yuejcapp2:[/yuejc]pstack 23677
Thread 12 (Thread 0x43d8b940 (LWP 23686)):
#0  0x00000032ec00d91b in read () from /lib64/libpthread.so.0
#1  0x00000000004a3735 in _NetReadSocket ()
#2  0x00000000004a3d1e in _dci_recv_msg ()
#3  0x0000000000493ca9 in _dci_query_buf ()
#4  0x0000000000494365 in _dci_send_query ()
#5  0x0000000000494daf in si_dci_query_p ()
#6  0x000000000049388a in dci_query_p ()
#7  0x000000000041df0d in mdb_stream::excuteSql() ()
#8  0x000000000041f283 in mdb_stream::open(mdb_connect&, char const*, int, int) ()
#9  0x0000000000406bfa in pthread_mdb_keepconnect(void*) ()
#10 0x00000032ec00673d in start_thread () from /lib64/libpthread.so.0
#11 0x00000032eb4d3d1d in clone () from /lib64/libc.so.6
Thread 11 (Thread 0x45d8c940 (LWP 23687)):
#0  0x00000032eb49a1a1 in nanosleep () from /lib64/libc.so.6
#1  0x00000032eb49a089 in sleep () from /lib64/libc.so.6
#2  0x0000000000406b3d in pthread_db_keepconnect(void*) ()
#3  0x00000032ec00673d in start_thread () from /lib64/libpthread.so.0
#4  0x00000032eb4d3d1d in clone () from /lib64/libc.so.6
Thread 10 (Thread 0x47d8d940 (LWP 19145)):
#0  0x00000032ec00d91b in read () from /lib64/libpthread.so.0
#1  0x00000000004a3735 in _NetReadSocket ()
#2  0x00000000004a3d1e in _dci_recv_msg ()
#3  0x0000000000493ca9 in _dci_query_buf ()
#4  0x0000000000494365 in _dci_send_query ()
#5  0x0000000000494daf in si_dci_query_p ()
#6  0x000000000049388a in dci_query_p ()
#7  0x000000000041df0d in mdb_stream::excuteSql() ()
#8  0x000000000048ee6b in mdb_stream::operator<<(int const&) ()
#9  0x000000000046c345 in mdb_select_userinfo_W ()
#10 0x0000000000426d2c in GetUserInfo(char*, _USER_INFO*) ()
#11 0x0000000000443005 in UserAbilityDeal(int&) ()
#12 0x000000000044f48f in ServiceOPenNewAdd(_SERVICE_OPEN_REQ&) ()
#13 0x0000000000408160 in pthread_service_open(void*) ()
#14 0x00000032ec00673d in start_thread () from /lib64/libpthread.so.0
#15 0x00000032eb4d3d1d in clone () from /lib64/libc.so.6
Thread 9 (Thread 0x49d8e940 (LWP 19375)):
#0  0x00000032ec00d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00000032ec008e1a in _L_lock_1034 () from /lib64/libpthread.so.0
#2  0x00000032ec008cdc in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000004304e9 in GetOrderInfo(char*, int, _ORDER_QUERY_INFO*) ()
#4  0x0000000000441007 in GetServiceOpenMoreInfo(_SERVICE_OPEN_REQ&) ()
#5  0x0000000000407ce6 in pthread_service_open(void*) ()
#6  0x00000032ec00673d in start_thread () from /lib64/libpthread.so.0
#7  0x00000032eb4d3d1d in clone () from /lib64/libc.so.6
Thread 8 (Thread 0x4dd90940 (LWP 19717)):
#0  0x00000032ec00d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00000032ec008e1a in _L_lock_1034 () from /lib64/libpthread.so.0
#2  0x00000032ec008cdc in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000004463ed in ServiceOpenChangePayType(_SERVICE_OPEN_REQ&) ()
#4  0x0000000000408444 in pthread_service_open(void*) ()
#5  0x00000032ec00673d in start_thread () from /lib64/libpthread.so.0
#6  0x00000032eb4d3d1d in clone () from /lib64/libc.so.6
yuejcapp2:[/yuejc/log]

對(duì)以上線程信息的分析,#0表示最底層的那個(gè)函數(shù)正在處理:
<1>.線程 Thread 12 正在read ()資源,
線程 Thread 11 在nanosleep ()暫停某個(gè)線程,
線程 Thread 10 正在read ()資源,
線程 Thread 9 在__lll_lock_wait ()對(duì)資源加鎖等待,
線程 Thread 8 在__lll_lock_wait ()對(duì)資源加鎖等待,
<2>.根據(jù)對(duì)以上線程的分析結(jié)果,檢查T(mén)hread 11 是守護(hù)進(jìn)程,人為正常暫停,且此線程鎖正常。
而Thread 9和Thread 8等待加鎖鎖定資源,是正常的等待。是什么原因讓這兩個(gè)線程一直等待呢?
在看一下Thread 12和Thread 10兩個(gè)線程同時(shí)在read資源,造成了資源強(qiáng)奪現(xiàn)象而被掛起。
<3>.根據(jù)以上分析,檢查T(mén)hread 12和Thread 10信息中的pthread_mdb_keepconnect()函數(shù)中的mdb_stream::open()函數(shù)和mdb_select_userinfo_W()函數(shù)。
發(fā)現(xiàn)線程Thread 12提示的mdb_stream::open()函數(shù),在代碼中沒(méi)有加線程鎖,增加線程鎖后,程序運(yùn)行正常,掛起現(xiàn)象解決。

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • 又來(lái)到了一個(gè)老生常談的問(wèn)題,應(yīng)用層軟件開(kāi)發(fā)的程序員要不要了解和深入學(xué)習(xí)操作系統(tǒng)呢? 今天就這個(gè)問(wèn)題開(kāi)始,來(lái)談?wù)劜?..
    tangsl閱讀 4,332評(píng)論 0 23
  • 文/tangsl(簡(jiǎn)書(shū)作者) 原文鏈接:http://www.itdecent.cn/p/2b993a4b913e...
    西葫蘆炒胖子閱讀 3,951評(píng)論 0 5
  • 參考鏈接:http://smallbug-vip.iteye.com/blog/2275743 在多線程開(kāi)發(fā)的過(guò)程...
    時(shí)之令閱讀 1,651評(píng)論 2 5
  • 線程 操作系統(tǒng)線程理論 線程概念的引入背景 進(jìn)程 之前我們已經(jīng)了解了操作系統(tǒng)中進(jìn)程的概念,程序并不能單獨(dú)運(yùn)行,只有...
    go以恒閱讀 1,800評(píng)論 0 6
  • Java 多線程 線程和進(jìn)程的區(qū)別 線程和進(jìn)程的本質(zhì):由CPU進(jìn)行調(diào)度的并發(fā)式執(zhí)行任務(wù),多個(gè)任務(wù)被快速輪換執(zhí)行,使...
    安安zoe閱讀 2,267評(píng)論 1 18

友情鏈接更多精彩內(nèi)容