九九五区八区99,国产主播在线观看网址,超碰在线c

Spark-Sql的參數(shù)調(diào)優(yōu)：

官網(wǎng)： http://spark.apache.org/docs/latest/sql-programming-guide.html

緩存表
參考：https://blog.csdn.net/lsshlsw/article/details/48662669

spark.catalog.cacheTable("tableName") 
sqlContext.cacheTable("tableName")
spark.catalog.uncacheTable("tableName") 
sqlContext.uncacheTable("tableName")

或者緩存dataFrame

dataFrame.cache()   
dataFrame.unpersist()

或者CACHE TABLE

//緩存全表
sqlContext.sql("CACHE TABLE activity")
//緩存過(guò)濾結(jié)果
sqlContext.sql("CACHE TABLE activity_cached as select * from activity where ...")
// lazy cache
sqlContext.sql("CACHE LAZY TABLE ...")
// 取消緩存
sqlContext.sql("UNCACHE TABLE activity")

可通過(guò)兩種配置方式開啟緩存數(shù)據(jù)功能：

使用SQLContext的setConf方法
執(zhí)行SQL命令 SET key=value

參數(shù)配置

Property Name	Default	Meaning
spark.sql.files.maxPartitionBytes	134217728 (128 MB)	The maximum number of bytes to pack into a single partition when reading files.
spark.sql.files.openCostInBytes	4194304 (4 MB)	The estimated cost to open a file, measured by the number of bytes could be scanned in the same time. This is used when putting multiple files into a partition. It is better to over estimated, then the partitions with small files will be faster than partitions with bigger files (which is scheduled first).
spark.sql.broadcastTimeout	300	Timeout in seconds for the broadcast wait time in broadcast joins
spark.sql.autoBroadcastJoinThreshold	10485760 (10 MB)	Configures the maximum size in bytes for a table that will be broadcast to all worker nodes when performing a join. By setting this value to -1 broadcasting can be disabled. Note that currently statistics are only supported for Hive Metastore tables where the command ANALYZE TABLE <tableName> COMPUTE STATISTICS noscan has been run.
spark.sql.shuffle.partitions	200	Configures the number of partitions to use when shuffling data for joins or aggregations.

用到的配置
-- spark.sql.autoBroadcastJoinThreshold， broadcast表的最大值10M，當(dāng)這是為-1時(shí)， broadcasting不可用，內(nèi)存允許的情況下加大這個(gè)值
-- spark.sql.shuffle.partitions 當(dāng)join或者聚合產(chǎn)生shuffle操作時(shí)， partitions的數(shù)量，這個(gè)值可以調(diào)大點(diǎn)，我一般配置500，切分更多的task，有助于數(shù)據(jù)傾斜的減緩，但是如果task越多， shuffle數(shù)據(jù)量也會(huì)增多

Broadcast Hint for SQL Queries
參考： https://blog.csdn.net/dabokele/article/details/65963401

import org.apache.spark.sql.functions.broadcast
broadcast(spark.table("src")).join(spark.table("records"), "key").show()

對(duì)于broadcast join模式，會(huì)將小于spark.sql.autoBroadcastJoinThreshold值（默認(rèn)為10M）的表廣播到其他計(jì)算節(jié)點(diǎn)，不走shuffle過(guò)程，所以會(huì)更加高效。
官網(wǎng)的原話是這個(gè)樣子：
The BROADCAST hint guides Spark to broadcast each specified table when joining them with another table or view. When Spark deciding the join methods, the broadcast hash join (i.e., BHJ) is preferred, even if the statistics is above the configuration spark.sql.autoBroadcastJoinThreshold. When both sides of a join are specified, Spark broadcasts the one having the lower statistics. Note Spark does not guarantee BHJ is always chosen, since not all cases (e.g. full outer join) support BHJ. When the broadcast nested loop join is selected, we still respect the hint.
注意：確定broadcast hash join的決定性因素是hive的表統(tǒng)計(jì)信息一定要準(zhǔn)確。并且，由于視圖是沒有表統(tǒng)計(jì)信息的，所以所有的視圖在join時(shí)都不會(huì)被廣播。所以至少要有一張hive表。

------------------------待完善------------------------

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

spark -SQL 配置參數(shù)

spark -SQL 配置參數(shù)

Spark-Sql的參數(shù)調(diào)優(yōu)：

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

spark -SQL 配置參數(shù)

Spark-Sql的參數(shù)調(diào)優(yōu)：

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av