CDH5.13的HiveMetaStore因Sentry同步導(dǎo)致的慢查詢鎖表問(wèn)題

解決方法:
https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_hive_ki.html

屏幕快照 2019-06-11 下午5.42.53.png

一、現(xiàn)象說(shuō)明

Hive metastore server多線程操作元數(shù)據(jù)庫(kù),導(dǎo)致數(shù)據(jù)表死鎖而引起的服務(wù)暫停

二、問(wèn)題排查

1、hive metastore service在每日活動(dòng)監(jiān)控中,發(fā)現(xiàn)有部分時(shí)間內(nèi),會(huì)出現(xiàn)一些錯(cuò)誤信息 (Caused by: org.apache.thrift.transport.TTransportException),sentry同步hive metastore信息時(shí)所產(chǎn)生,但該狀況很快會(huì)被自動(dòng)修復(fù)

2、在hive metastore service 做數(shù)據(jù)交互時(shí),由于并發(fā)量較大,mysql數(shù)據(jù)表產(chǎn)生死鎖(Lock wait timeout exceeded; try restarting transaction),數(shù)據(jù)庫(kù)表鎖沒(méi)有及時(shí)釋放

3、由于Hive Metastore死鎖導(dǎo)致使用Sentry + Hive時(shí)高度并發(fā)寫入工作負(fù)載下的查詢緩慢或停滯

metastore日志分析以及airflow調(diào)度任務(wù)并發(fā)量分析

a) 提取錯(cuò)誤節(jié)點(diǎn)日志,獲取關(guān)鍵錯(cuò)誤信息

b) 檢查任務(wù)并發(fā)量,并發(fā)任務(wù)執(zhí)行時(shí),所建立TCP連接數(shù)量

c) hive metastore源碼解讀

主要錯(cuò)誤節(jié)點(diǎn)信息:可以看到是對(duì)HMS中notification_sequence的更新操作報(bào)錯(cuò)。

2019-05-05 01:18:11,363 ERROR DataNucleus.Datastore.Persist: [pool-5-thread-178]: Update of object "org.apache.hadoop.hive.metastore.model.MNotificationNextId@58252dbf" using statement "UPDATE `NOTIFICATION_SEQUENCE` SET `NEXT_EVENT_ID`=? WHERE `NNI_ID`=?" failed : com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
    at com.mysql.jdbc.Util.getInstance(Util.java:408)
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:952)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
    at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
    at com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079)
    at com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013)
    at com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104)
    at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1998)
    at com.jolbox.bonecp.PreparedStatementHandle.executeUpdate(PreparedStatementHandle.java:205)
    at org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeUpdate(ParamLoggingPreparedStatement.java:399)
    at org.datanucleus.store.rdbms.SQLController.executeStatementUpdate(SQLController.java:439)
    at org.datanucleus.store.rdbms.request.UpdateRequest.execute(UpdateRequest.java:374)
    at org.datanucleus.store.rdbms.RDBMSPersistenceHandler.updateTable(RDBMSPersistenceHandler.java:417)
    at org.datanucleus.store.rdbms.RDBMSPersistenceHandler.updateObject(RDBMSPersistenceHandler.java:390)
    at org.datanucleus.state.JDOStateManager.flush(JDOStateManager.java:5027)
    at org.datanucleus.flush.FlushOrdered.execute(FlushOrdered.java:106)
    at org.datanucleus.ExecutionContextImpl.flushInternal(ExecutionContextImpl.java:4119)
    at org.datanucleus.ExecutionContextThreadedImpl.flushInternal(ExecutionContextThreadedImpl.java:450)
    at org.datanucleus.store.query.Query.prepareDatastore(Query.java:1575)
    at org.datanucleus.store.query.Query.executeQuery(Query.java:1760)
    at org.datanucleus.store.query.Query.executeWithArray(Query.java:1672)
    at org.datanucleus.store.query.Query.execute(Query.java:1654)
    at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:221)
    at org.apache.hadoop.hive.metastore.ObjectStore.addNotificationEvent(ObjectStore.java:7754)
    at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:103)
    at com.sun.proxy.$Proxy7.addNotificationEvent(Unknown Source)
    at org.apache.hive.hcatalog.listener.DbNotificationListener.enqueue(DbNotificationListener.java:335)
    at org.apache.hive.hcatalog.listener.DbNotificationListener.onDropPartition(DbNotificationListener.java:183)
    at org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier$9.notify(MetaStoreListenerNotifier.java:93)
    at org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:167)
    at org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:197)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:3134)
    at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
    at com.sun.proxy.$Proxy10.drop_partitions_req(Unknown Source)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10650)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10634)
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
    at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:752)
    at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:747)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
    at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:747)
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)
 
2019-05-05 01:18:11,368 ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-5-thread-178]: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
    at java.util.ArrayList.rangeCheck(ArrayList.java:653)
    at java.util.ArrayList.get(ArrayList.java:429)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:3175)
    at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
    at com.sun.proxy.$Proxy10.drop_partitions_req(Unknown Source)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10650)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10634)
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
    at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:752)
    at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:747)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
    at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:747)
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)
 
2019-05-05 01:18:11,368 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [pool-5-thread-178]: </PERFLOG method=drop_partitions_req start=1556990240187 end=1556990291368 duration=51181 from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=168 retryCount=-1 error=true>
2019-05-05 01:18:11,368 ERROR org.apache.thrift.server.TThreadPoolServer: [pool-5-thread-178]: Error occurred during processing of message.
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
    at java.util.ArrayList.rangeCheck(ArrayList.java:653)
    at java.util.ArrayList.get(ArrayList.java:429)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:3175)
    at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
    at com.sun.proxy.$Proxy10.drop_partitions_req(Unknown Source)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10650)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10634)
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
    at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:752)
    at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:747)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
    at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:747)
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)

阿里云RDS數(shù)據(jù)庫(kù)服務(wù)TCP連接以及慢日志信息分析

a)
image.png

b)通過(guò)對(duì)RDS慢日志信息回查,對(duì)于數(shù)據(jù)庫(kù)表鎖信息沒(méi)有找到相對(duì)應(yīng)的結(jié)果

三、問(wèn)題解決方案

a)修改msyql在innodb引擎下的數(shù)據(jù)庫(kù)鎖等待時(shí)間:

set global innodb_lock_wait_timeout=180

b)目前受影響的版本有CDH 5.13.0,5.13.1,5.13.2,5.14.0和5.14.1,并且在平臺(tái)中Hive和Sentry一起運(yùn)行,可以采取緩解措施:

在Cloudera Manager中,轉(zhuǎn)至群集> Hive服務(wù)> 配置,然后搜索hive-site.xml的Hive Metastore Server高級(jí)配置代碼段(安全閥),單擊加號(hào)(+)兩次,然后添加以下值:
設(shè)置hive.metastore.transactional.event.listeners為null
設(shè)置hive.metastore.event.listeners為org.apache.hive.hcatalog.listener.DbNotificationListener

應(yīng)用此變通辦法可能會(huì)導(dǎo)致某些DDL查詢(例如,刪除表的查詢以及使用相同名稱創(chuàng)建的新表)導(dǎo)致意外失敗,可能會(huì)報(bào)出<samp class="ph codeph">沒(méi)有有效的權(quán)限</samp>錯(cuò)誤。重新運(yùn)行此類查詢以解決此錯(cuò)誤。

c)上述兩種方案都是臨時(shí)緩解而做的,可以通過(guò)升級(jí)CDH版本,添加修復(fù)補(bǔ)丁,進(jìn)行完整修復(fù)

解決問(wèn)題參考:https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_hive_ki.html#tsb_2018_294

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容