解決方法:
https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_hive_ki.html

一、現(xiàn)象說(shuō)明
Hive metastore server多線程操作元數(shù)據(jù)庫(kù),導(dǎo)致數(shù)據(jù)表死鎖而引起的服務(wù)暫停
二、問(wèn)題排查
1、hive metastore service在每日活動(dòng)監(jiān)控中,發(fā)現(xiàn)有部分時(shí)間內(nèi),會(huì)出現(xiàn)一些錯(cuò)誤信息 (Caused by: org.apache.thrift.transport.TTransportException),sentry同步hive metastore信息時(shí)所產(chǎn)生,但該狀況很快會(huì)被自動(dòng)修復(fù)
2、在hive metastore service 做數(shù)據(jù)交互時(shí),由于并發(fā)量較大,mysql數(shù)據(jù)表產(chǎn)生死鎖(Lock wait timeout exceeded; try restarting transaction),數(shù)據(jù)庫(kù)表鎖沒(méi)有及時(shí)釋放
3、由于Hive Metastore死鎖導(dǎo)致使用Sentry + Hive時(shí)高度并發(fā)寫入工作負(fù)載下的查詢緩慢或停滯
metastore日志分析以及airflow調(diào)度任務(wù)并發(fā)量分析
a) 提取錯(cuò)誤節(jié)點(diǎn)日志,獲取關(guān)鍵錯(cuò)誤信息
b) 檢查任務(wù)并發(fā)量,并發(fā)任務(wù)執(zhí)行時(shí),所建立TCP連接數(shù)量
c) hive metastore源碼解讀
主要錯(cuò)誤節(jié)點(diǎn)信息:可以看到是對(duì)HMS中notification_sequence的更新操作報(bào)錯(cuò)。
2019-05-05 01:18:11,363 ERROR DataNucleus.Datastore.Persist: [pool-5-thread-178]: Update of object "org.apache.hadoop.hive.metastore.model.MNotificationNextId@58252dbf" using statement "UPDATE `NOTIFICATION_SEQUENCE` SET `NEXT_EVENT_ID`=? WHERE `NNI_ID`=?" failed : com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
at com.mysql.jdbc.Util.getInstance(Util.java:408)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:952)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
at com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079)
at com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013)
at com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1998)
at com.jolbox.bonecp.PreparedStatementHandle.executeUpdate(PreparedStatementHandle.java:205)
at org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeUpdate(ParamLoggingPreparedStatement.java:399)
at org.datanucleus.store.rdbms.SQLController.executeStatementUpdate(SQLController.java:439)
at org.datanucleus.store.rdbms.request.UpdateRequest.execute(UpdateRequest.java:374)
at org.datanucleus.store.rdbms.RDBMSPersistenceHandler.updateTable(RDBMSPersistenceHandler.java:417)
at org.datanucleus.store.rdbms.RDBMSPersistenceHandler.updateObject(RDBMSPersistenceHandler.java:390)
at org.datanucleus.state.JDOStateManager.flush(JDOStateManager.java:5027)
at org.datanucleus.flush.FlushOrdered.execute(FlushOrdered.java:106)
at org.datanucleus.ExecutionContextImpl.flushInternal(ExecutionContextImpl.java:4119)
at org.datanucleus.ExecutionContextThreadedImpl.flushInternal(ExecutionContextThreadedImpl.java:450)
at org.datanucleus.store.query.Query.prepareDatastore(Query.java:1575)
at org.datanucleus.store.query.Query.executeQuery(Query.java:1760)
at org.datanucleus.store.query.Query.executeWithArray(Query.java:1672)
at org.datanucleus.store.query.Query.execute(Query.java:1654)
at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:221)
at org.apache.hadoop.hive.metastore.ObjectStore.addNotificationEvent(ObjectStore.java:7754)
at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:103)
at com.sun.proxy.$Proxy7.addNotificationEvent(Unknown Source)
at org.apache.hive.hcatalog.listener.DbNotificationListener.enqueue(DbNotificationListener.java:335)
at org.apache.hive.hcatalog.listener.DbNotificationListener.onDropPartition(DbNotificationListener.java:183)
at org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier$9.notify(MetaStoreListenerNotifier.java:93)
at org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:167)
at org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:197)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:3134)
at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
at com.sun.proxy.$Proxy10.drop_partitions_req(Unknown Source)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10650)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10634)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:752)
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:747)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:747)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
2019-05-05 01:18:11,368 ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-5-thread-178]: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:3175)
at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
at com.sun.proxy.$Proxy10.drop_partitions_req(Unknown Source)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10650)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10634)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:752)
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:747)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:747)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
2019-05-05 01:18:11,368 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [pool-5-thread-178]: </PERFLOG method=drop_partitions_req start=1556990240187 end=1556990291368 duration=51181 from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=168 retryCount=-1 error=true>
2019-05-05 01:18:11,368 ERROR org.apache.thrift.server.TThreadPoolServer: [pool-5-thread-178]: Error occurred during processing of message.
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:3175)
at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
at com.sun.proxy.$Proxy10.drop_partitions_req(Unknown Source)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10650)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10634)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:752)
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:747)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:747)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
阿里云RDS數(shù)據(jù)庫(kù)服務(wù)TCP連接以及慢日志信息分析
a)
b)通過(guò)對(duì)RDS慢日志信息回查,對(duì)于數(shù)據(jù)庫(kù)表鎖信息沒(méi)有找到相對(duì)應(yīng)的結(jié)果
三、問(wèn)題解決方案
a)修改msyql在innodb引擎下的數(shù)據(jù)庫(kù)鎖等待時(shí)間:
set global innodb_lock_wait_timeout=180
b)目前受影響的版本有CDH 5.13.0,5.13.1,5.13.2,5.14.0和5.14.1,并且在平臺(tái)中Hive和Sentry一起運(yùn)行,可以采取緩解措施:
在Cloudera Manager中,轉(zhuǎn)至群集> Hive服務(wù)> 配置,然后搜索hive-site.xml的Hive Metastore Server高級(jí)配置代碼段(安全閥),單擊加號(hào)(+)兩次,然后添加以下值:
設(shè)置hive.metastore.transactional.event.listeners為null
設(shè)置hive.metastore.event.listeners為org.apache.hive.hcatalog.listener.DbNotificationListener
應(yīng)用此變通辦法可能會(huì)導(dǎo)致某些DDL查詢(例如,刪除表的查詢以及使用相同名稱創(chuàng)建的新表)導(dǎo)致意外失敗,可能會(huì)報(bào)出<samp class="ph codeph">沒(méi)有有效的權(quán)限</samp>錯(cuò)誤。重新運(yùn)行此類查詢以解決此錯(cuò)誤。
c)上述兩種方案都是臨時(shí)緩解而做的,可以通過(guò)升級(jí)CDH版本,添加修復(fù)補(bǔ)丁,進(jìn)行完整修復(fù)
解決問(wèn)題參考:https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_hive_ki.html#tsb_2018_294