https://blog.csdn.net/weixin_40829577/article/details/109001268
目錄
為了提高性能spark對元數(shù)據(jù)做了緩存,如果外部系統(tǒng)更新了元數(shù)據(jù),spark使用時要更新緩存過的該表元數(shù)據(jù).
/**
* Invalidates and refreshes all the cached data and metadata of the given table. For performance
* reasons, Spark SQL or the external data source library it uses might cache certain metadata
* about a table, such as the location of blocks. When those change outside of Spark SQL, users
* should call this function to invalidate the cache.
*
* If this table is cached as an InMemoryRelation, drop the original cached version and make the
* new version cached lazily.
*
* @param tableName is either a qualified or unqualified name that designates a table/view.
*? ? ? ? ? ? ? ? ? If no database identifier is provided, it refers to a temporary view or
*? ? ? ? ? ? ? ? ? a table/view in the current database.
* @since 2.0.0
*/
defrefreshTable(tableName:String):Unit
1. 啟動客spark-shell客戶端
1) 分配executor-memory/driver-memory 足夠的內(nèi)存, 否則會內(nèi)存溢出;
2) 并發(fā)度不宜過大, 否則會超過允許的并發(fā)訪問次數(shù);
spark-shell \
--name ShyTestError \
--master yarn \
--deploy-mode client \
--num-executors 3 \
--executor-memory 24G \
--executor-cores 2 \
--driver-memory 8G \
--conf spark.dynamicAllocation.enabled=false \
--conf spark.executor.memoryOverhead=4G \
?--conf spark.default.parallelism=12 \
?--conf spark.sql.shuffle.partitions=12
2. 刷新對應(yīng)表的元數(shù)據(jù)
spark.catalog.refreshTable("table_name")