MySQL 記一次 Bug發(fā)現(xiàn)過程


水平有限有誤請諒解
這個(gè)問題是一位朋友@DBA-老莊的,他們使用的是PXC環(huán)境如下:
MySQL:5.7.18-15
wsrep:29.20
os:Red Hat Enterprise Linux Server release 6.5
實(shí)際上我對PXC并不是很熟,通過分析pstack還是找到了問題。并且提交Bug,percona確認(rèn)了。雖然我不是第一個(gè)發(fā)現(xiàn)這個(gè)問題的人。


一、問題描述

數(shù)據(jù)庫處于完全hang住的狀態(tài),不能連接,不能kill連接,不能show engine innodb 等等。已有的連接通過show processlist看到大量如下的連接:


image.png

操作系統(tǒng)層面基本看不出來任何負(fù)載:

image.png

image.png

對于這種問題只能用pstack進(jìn)行問題確認(rèn)了。

二、為什么這么多opening tables的會話

通過pstack我發(fā)現(xiàn)很多會話堵塞在trx_allocate_for_mysql()如下:

Thread 54 (Thread 0x7f9085cf6700 (LWP 17448)):
#0  0x0000003715e0b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000011059cb in os_event::wait_low(long) ()
#2  0x00000000011b0449 in sync_array_wait_event(sync_array_t*, sync_cell_t*&) ()
#3  0x000000000108a8c4 in TTASEventMutex<GenericPolicy>::wait(char const*, unsigned int, unsigned int) ()
#4  0x000000000108aa3b in PolicyMutex<TTASEventMutex<GenericPolicy> >::enter(unsigned int, unsigned int, char const*, unsigned int) ()
#5  0x00000000011e5974 in trx_allocate_for_mysql() () #wait trx 
#6  0x000000000106fa9f in innobase_trx_allocate(THD*) ()
#7  0x0000000001076d28 in ha_innobase::extra(ha_extra_function) ()
#8  0x0000000000ce4229 in open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) ()
#9  0x0000000000ce5912 in open_tables_for_query(THD*, TABLE_LIST*, unsigned int) ()

簡單的說對于innodb表進(jìn)行任何操作即便是select也需要分配事物,如果在事物池中沒有可用的事物就行要調(diào)用這個(gè)函數(shù)進(jìn)行分配,以下是棧幀:

#0  trx_allocate_for_mysql () at /mysql/mysql-5.7.17/storage/innobase/trx/trx0trx.cc:538
#1  0x0000000001913d62 in innobase_trx_allocate (thd=0x7fffc8000d30) at /mysql/mysql-5.7.17/storage/innobase/handler/ha_innodb.cc:2580
#2  0x0000000001913e04 in check_trx_exists (thd=0x7fffc8000d30) at /mysql/mysql-5.7.17/storage/innobase/handler/ha_innodb.cc:2605
#3  0x0000000001914482 in ha_innobase::update_thd (this=0x7fffc8009990, thd=0x7fffc8000d30) at /mysql/mysql-5.7.17/storage/innobase/handler/ha_innodb.cc:2825
#4  0x00000000019296b4 in ha_innobase::info_low (this=0x7fffc8009990, flag=26, is_analyze=false) at /mysql/mysql-5.7.17/storage/innobase/handler/ha_innodb.cc:13805
#5  0x000000000192a385 in ha_innobase::info (this=0x7fffc8009990, flag=26) at /mysql/mysql-5.7.17/storage/innobase/handler/ha_innodb.cc:14211
#6  0x000000000191ad83 in ha_innobase::open (this=0x7fffc8009990, name=0x7fffcc1b4540 "./test/test1", mode=2, test_if_locked=2)
    at /mysql/mysql-5.7.17/storage/innobase/handler/ha_innodb.cc:6130
#7  0x0000000000f48d09 in handler::ha_open (this=0x7fffc8009990, table_arg=0x7fffc8008fe0, name=0x7fffcc1b4540 "./test/test1", mode=2, test_if_locked=2)
    at /mysql/mysql-5.7.17/sql/handler.cc:2759
#8  0x0000000001674fd1 in open_table_from_share (thd=0x7fffc8000d30, share=0x7fffcc1b4170, alias=0x7fffc80051d8 "test1", db_stat=39, prgflag=8, ha_open_flags=0, 
    outparam=0x7fffc8008fe0, is_create_table=false) at /mysql/mysql-5.7.17/sql/table.cc:3336
#9  0x00000000014f9577 in open_table (thd=0x7fffc8000d30, table_list=0x7fffc80051e0, ot_ctx=0x7ffff149fb80) at /mysql/mysql-5.7.17/sql/sql_base.cc:3522
#10 0x00000000014fbf7f in open_and_process_table (thd=0x7fffc8000d30, lex=0x7fffc8003028, tables=0x7fffc80051e0, counter=0x7fffc80030e8, flags=0, 
    prelocking_strategy=0x7ffff149fcb0, has_prelocking_list=false, ot_ctx=0x7ffff149fb80) at /mysql/mysql-5.7.17/sql/sql_base.cc:5108
#11 0x00000000014fd06a in open_tables (thd=0x7fffc8000d30, start=0x7ffff149fc70, counter=0x7fffc80030e8, flags=0, prelocking_strategy=0x7ffff149fcb0)
    at /mysql/mysql-5.7.17/sql/sql_base.cc:5719

而這個(gè)函數(shù)里面包含如下代碼:

       trx_sys_mutex_enter(); ##獲取trx_sys->mutex鎖
    ut_d(trx->in_mysql_trx_list = TRUE);
    UT_LIST_ADD_FIRST(trx_sys->mysql_trx_list, trx); ##將事物加入trx_sys全局結(jié)構(gòu)中的鏈表中
    trx_sys_mutex_exit();

trx_sys是一個(gè)全局的數(shù)據(jù)結(jié)構(gòu),各個(gè)事物都以鏈表的形式掛載到它下面,那么修改這些鏈表需要通過一個(gè)mutex來保護(hù)這個(gè)全局?jǐn)?shù)據(jù)結(jié)構(gòu)避免多線程并發(fā)的修改。比如這里就是更新鏈表操作。但是我們從棧幀來看他處于open_table函數(shù),本函數(shù)主要建立table cache同時(shí)做好表的實(shí)例化,也就是建立好mysql層和innodb層文件的對應(yīng)關(guān)系,此外還會獲取相應(yīng)的MDL LOCK和打開frm文件。
為了測試我簡單的在代碼中加入了sleep(10),停頓10秒,可以看到如下。證明這里的opening tables確實(shí)是在trx_allocate_for_mysql 發(fā)生了等待出現(xiàn)的狀態(tài):


image.png

所以show processlist的state只是一個(gè)狀態(tài)值,它代表是代碼某一段到某一段的執(zhí)行階段,下面是一個(gè)典型的select的狀態(tài)切換流程。但是要確認(rèn)問題,有時(shí)候光靠這個(gè)是不夠的。

T@2: | THD::enter_stage: 'starting' /root/mysql5.7.14/percona-server-5.7.14-7/sql/conn_handler/socket_connection.cc:100
T@2: | | | | | THD::enter_stage: 'checking permissions' /root/mysql5.7.14/percona-server-5.7.14-7/sql/auth/sql_authorization.cc:843
T@2: | | | | | | THD::enter_stage: 'Opening tables' /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_base.cc:5719
T@2: | | | | | THD::enter_stage: 'init' /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_select.cc:121
T@2: | | | | | | | THD::enter_stage: 'System lock' /root/mysql5.7.14/percona-server-5.7.14-7/sql/lock.cc:321
T@2: | | | | | | | THD::enter_stage: 'optimizing' /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_optimizer.cc:151
T@2: | | | | | | | THD::enter_stage: 'statistics' /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_optimizer.cc:386
T@2: | | | | | | | THD::enter_stage: 'preparing' /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_optimizer.cc:494
T@2: | | | | | | THD::enter_stage: 'executing' /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_executor.cc:119
T@2: | | | | | | THD::enter_stage: 'Sending data' /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_executor.cc:195
T@2: | | | | | THD::enter_stage: 'end' /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_select.cc:199
T@2: | | | | THD::enter_stage: 'query end' /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_parse.cc:5174
T@2: | | | | THD::enter_stage: 'closing tables' /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_parse.cc:5252
T@2: | | | THD::enter_stage: 'freeing items' /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_parse.cc:5855
T@2: | | THD::enter_stage: 'cleaning up' /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_parse.cc:1884

三、詳細(xì)的分析pstack

因?yàn)閜stack日志太長了。我就不貼了。詳細(xì)的分析pstack日志在開頭給出的bug連接。其實(shí)要在冗長的pstack中找到有用的信息和合理的解釋是一個(gè)困難的過程,因?yàn)樵创a能力非常有限,某些時(shí)候只能通過搜索臨界區(qū)來確認(rèn)問題。下面是我分析的結(jié)果,也是提交bug給出了的:

I use pstack to review stack discover Dead lock 
Analyze pstack i find some problem:
Thread 56:
lock:trx_sys (when parameter wsrep_log_conflicts=ON lock0lock.cc 2281 line) 
requisite:LOCK_wsrep_thd

Thread 9:
lock: LOCK_thd_list (mysql_thread_manager.cc 339 line)
requisite:LOCK_thd_data (sql_parse.h 175 line)

Thread 26:
lock: LOCK_thd_data (in PFS_status_variable_cache::do_materialize_all after PFS_status_variable_cache::manifest release LOCK_thd_data ,but hang)
requisite:trx_sys->mutex (srv0srv.cc 1703 line)

a lot of Thread wait when call function trx_allocate_for_mysql at mutex trx_sys
a lot of Thread wait when call function THD::release_resources at mutex LOCK_thd_data
a lot of Thread wait when call function Global_THD_manager::add_thd at mutex LOCK_thd_list
and any other mutex wait!!

but I not find which thread hold LOCK_wsrep_thd mutex.

Now we do follow things hope to resolve this problem:
1、wsrep_log_conflicts=off
2、SET global optimizer_switch = 'materialization=off';
3、at high load time not execute sql 'show [global] status/select * from performance_schema.global_status'

簡單的說我發(fā)現(xiàn)有多個(gè)線程獲取mutex近乎出現(xiàn)環(huán)狀,但是其中一環(huán)沒有找到。最終percona恢復(fù)如下:

Your problem sounds quite similar to one mentioned here: https://jira.percona.com/browse/PXC-877
Said release fix the issue https://www.percona.com/blog/2018/01/26/percona-xtradb-cluster-5-7-20-29-24-is-now-available/
You may want to consider an upgrade to latest one though which has more fixes 5.7.21.

雖然我不是第一個(gè)發(fā)現(xiàn)的人,但是起碼確認(rèn)我的分析基本確認(rèn)的問題。蛋疼又說升級升級。

作者微信:


微信.jpg
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容