默認(rèn)情況下hive訪問不到子目錄下面數(shù)據(jù)的問題
我們通過把hdp的數(shù)據(jù)遷移到了cdh,然后發(fā)現(xiàn)一些hive表存儲(chǔ)的地方在用day進(jìn)行分區(qū)之后,在day分區(qū)的目錄之下,還有存在1/或者2/這樣的子目錄,我們發(fā)現(xiàn)通過分區(qū)條件來進(jìn)行sql查詢,查不到任何數(shù)據(jù),可以通過設(shè)置配置
set hive.mapred.supports.subdirectories=true;
set mapred.input.dir.recursive=true;
來保證hive命令查詢的時(shí)候可以讀到分區(qū)目錄下面的子目錄的數(shù)據(jù)
sentry相關(guān)的權(quán)限
安裝sentry的相關(guān)文章
http://blog.xiaoxiaomo.com/2016/10/19/Sentry-%E9%80%9A%E8%BF%87Cloudera-Manager%E9%85%8D%E7%BD%AESentry/
我們也通過hadoop sentry 來管理權(quán)限, 需要把hdfs、hive、hue相關(guān)的sentry關(guān)聯(lián)配置打開,然后通過hue里面的配置來設(shè)置 hive table 和 hdfs 的相關(guān)權(quán)限,但是如果不設(shè)置linux文件系統(tǒng)的權(quán)限,在使用load linux文件的時(shí)候,會(huì)出現(xiàn)錯(cuò)誤
The required privileges: Server=server1->URI=file:///opt/logfiles/userlogsup/muserbehaviorlog/1020210137/20171124/201711241200.txt->action=*;
可以在hue里面選擇url,然后手動(dòng)填寫 file:///opt/logfiles/ 并給予ALL權(quán)限,可以解決上述的錯(cuò)誤,所以除開 hive table 和 hdfs的文件,還可以通過sentry來管理 linux 機(jī)器上的文件目錄
spark2找不到對(duì)應(yīng)的包
在CDH安裝了spark2之后,執(zhí)行hive會(huì)出現(xiàn)錯(cuò)誤提示
ls: 無法訪問/opt/cloudera/parcels/SPARK2/lib/spark2/lib/spark-assembly-*.jar: 沒有那個(gè)文件或目錄
是因?yàn)閟park2修改了lib包的位置,改成了jars,修改hive的啟動(dòng)腳本,把lib/spark-assembly-*.jar 改成 jars/*,這樣啟動(dòng)就不會(huì)出現(xiàn)這個(gè)錯(cuò)誤了。
hiveserver2死鎖的問題
今天在導(dǎo)入數(shù)據(jù)的時(shí)候,由于導(dǎo)入錯(cuò)誤,直接對(duì)于正在導(dǎo)入的進(jìn)程執(zhí)行了中斷,然后再去刪除或者查詢相應(yīng)的表的時(shí)候, 直接卡住不動(dòng)了,查看日志,發(fā)現(xiàn)出現(xiàn)了死鎖,可以通過修改hive-site.xml文件
<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
把value改成false,關(guān)閉死鎖,來解決這個(gè)問題
查詢hive出現(xiàn)數(shù)組越界的問題
在用hive查詢的時(shí)候,出現(xiàn)
http://cdh-master-244:8088/taskdetails.jsp?jobid=job_1511834313005_0092&tipid=task_1511834313005_0092_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:52)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
... 8 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 9
at org.apache.hadoop.hive.ql.exec.vector.expressions.ConstantVectorExpression.evaluateLong(ConstantVectorExpression.java:99)
at org.apache.hadoop.hive.ql.exec.vector.expressions.ConstantVectorExpression.evaluate(ConstantVectorExpression.java:147)
at org.apache.hadoop.hive.ql.exec.vector.expressions.aggregates.VectorUDAFCount.aggregateInputSelection(VectorUDAFCount.java:96)
at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processAggregators(VectorGroupByOperator.java:148)
at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeHashAggregate.processBatch(VectorGroupByOperator.java:322)
at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.processOp(VectorGroupByOperator.java:866)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.processOp(VectorFilterOperator.java:111)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:98)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
... 9 more
可以通過增加配置
set hive.vectorized.execution.enabled=false;
來進(jìn)行解決,相關(guān)的issue地址
https://issues.apache.org/jira/browse/HIVE-11933