Tez內(nèi)存優(yōu)化參考

  1. AM、Container大小設(shè)置
tez.am.resource.memory.mb

參數(shù)說(shuō)明:Set tez.am.resource.memory.mb tobe the same as yarn.scheduler.minimum-allocation-mb the YARNminimum container size.

hive.tez.container.size

參數(shù)說(shuō)明:Set hive.tez.container.size to be the same as or a small multiple(1 or 2 times that) of YARN container size yarn.scheduler.minimum-allocation-mb but NEVER more than yarn.scheduler.maximum-allocation-mb.

  1. AM、Container JVM參數(shù)設(shè)置
tez.am.launch.cmd-opts

默認(rèn)值:80%*tez.am.resource.memory.mb
參數(shù)說(shuō)明:一般不需要調(diào)整

hive.tez.java.ops

默認(rèn)值:80%*hive.tez.container.size
參數(shù)說(shuō)明:Hortonworks建議“–server –Djava.net.preferIPv4Stack=true–XX:NewRatio=8 –XX:+UseNUMA –XX:UseG1G”

tez.container.max.java.heap.fraction

默認(rèn)值:0.8
參數(shù)說(shuō)明:task\AM占用JVM Xmx的比例,該參數(shù)建議調(diào)整,需根據(jù)具體業(yè)務(wù)情況修改;

  1. Hive內(nèi)存Map Join參數(shù)設(shè)置
tez.runtime.io.sort.mb

默認(rèn)值:100
參數(shù)說(shuō)明:輸出排序需要的內(nèi)存大小。建議值:40%*hive.tez.container.size,一般不超過(guò)2G;

hive.auto.convert.join.noconditionaltask

默認(rèn)值:true
參數(shù)說(shuō)明:是否將多個(gè)mapjoin合并為一個(gè),使用默認(rèn)值

hive.auto.convert.join.noconditionaltask.size

默認(rèn)值:
參數(shù)說(shuō)明:多個(gè)mapjoin轉(zhuǎn)換為1個(gè)時(shí),所有小表的文件大小總和的最大值,這個(gè)值只是限制輸入的表文件的大小,并不代表實(shí)際mapjoin時(shí)hashtable的大小。 建議值:1/3* hive.tez.container.size

tez.runtime.unordered.output.buffer.size-mb

默認(rèn)值:100
參數(shù)說(shuō)明:Size of the buffer to use if not writing directly to disk.。 建議值:10%* hive.tez.container.size

  1. Container重用設(shè)置
tez.am.container.reuse.enabled

默認(rèn)值:true參數(shù)說(shuō)明:Container重用開(kāi)關(guān)


Mapper/Reducer優(yōu)化

  1. Mapper數(shù)設(shè)置
tez.grouping.min-size

默認(rèn)值:5010241024 (50M)
參數(shù)說(shuō)明:Lower bound on thesize (in bytes) of a grouped split, to avoid generating too many small splits.

tez.grouping.max-size

默認(rèn)值:102410241024
參數(shù)說(shuō)明:Upper bound on thesize (in bytes) of a grouped split, to avoid generating excessively largesplits.

  1. Reducer數(shù)設(shè)置
hive.tez.auto.reducer.parallelism

默認(rèn)值:false
參數(shù)說(shuō)明:Turn on Tez' autoreducer parallelism feature. When enabled, Hive will still estimate data sizesand set parallelism estimates. Tez will sample source vertices' output sizesand adjust the estimates at runtime as necessary.

建議設(shè)置為true.

hive.tex.min.partition.factor

默認(rèn)值:0.25
參數(shù)說(shuō)明:When auto reducerparallelism is enabled this factor will be used to put a lower limit to thenumber of reducers that Tez specifies.

hive.tez.max.partition.factor

默認(rèn)值:2.0
參數(shù)說(shuō)明:When auto reducerparallelism is enabled this factor will be used to over-partition data inshuffle edges.

hive.exec.reducers.bytes.per.reducer

默認(rèn)值:256,000,000 (256M)
參數(shù)說(shuō)明:Sizeper reducer. The default in Hive 0.14.0 and earlier is 1 GB, that is, if theinput size is 10 GB then 10 reducers will be used. In Hive 0.14.0 and later thedefault is 256 MB, that is, if the input size is 1 GB then 4 reducers willbe used.

以下公式確認(rèn)Reducer個(gè)數(shù):
Max(1, Min(hive.exec.reducers.max [1009], ReducerStage estimate/hive.exec.reducers.bytes.per.reducer)) x hive.tez.max.partition.factor

3、Shuffle參數(shù)設(shè)置

tez.shuffle-vertex-manager.min-src-fraction

默認(rèn)值:0.25
參數(shù)說(shuō)明:thefraction of source tasks which should complete before tasks for the currentvertex are scheduled.

tez.shuffle-vertex-manager.max-src-fraction

默認(rèn)值:0.75
參數(shù)說(shuō)明:oncethis fraction of source tasks have completed, all tasks on the current vertexcan be scheduled. Number of tasks ready for scheduling on the current vertexscales linearly between min-fraction and max-fraction.

例子:

hive.exec.reducers.bytes.per.reducer=1073741824; // 1GB
tez.shuffle-vertex-manager.min-src-fraction=0.25;
tez.shuffle-vertex-manager.max-src-fraction=0.75;

This indicates thatthe decision will be made between 25% of mappers finishing and 75% of mappersfinishing, provided there's at least 1Gb of data being output (i.e if 25% ofmappers don't send 1Gb of data, we will wait till at least 1Gb is sent out).

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

友情鏈接更多精彩內(nèi)容