人妻制服丝袜一区,久热一区二区三区四区,中文字幕日韩电影一区

? spark sql 的前身是shark，類似于 hive，用戶可以基于spark引擎使用sql語(yǔ)句對(duì)數(shù)據(jù)進(jìn)行分析，而不用去編寫程序代碼。

屏幕快照 2018-08-12 下午2.55.58

? 上圖很好的展示了spark sql的功能，提供了 jdbc，console，編程接口三種方式來操作RDD(Resilient Distributed Datasets)，用戶只需要編寫sql即可，不需要編寫程序代碼。

? spark sql的運(yùn)行流程如下：

屏幕快照 2018-08-12 下午2.56.38

大概有6步：

sql 語(yǔ)句經(jīng)過 SqlParser 解析成 Unresolved Logical Plan;
analyzer 結(jié)合 catalog 進(jìn)行綁定,生成 Logical Plan;
optimizer 對(duì) Logical Plan 優(yōu)化,生成 Optimized LogicalPlan;
SparkPlan 將 Optimized LogicalPlan 轉(zhuǎn)換成 Physical Plan;
prepareForExecution()將 Physical Plan 轉(zhuǎn)換成 executed Physical Plan;
execute()執(zhí)行可執(zhí)行物理計(jì)劃，得到RDD;

上述流程在spark中對(duì)應(yīng)的源碼部分：

class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) {

  // TODO: Move the planner an optimizer into here from SessionState.
  protected def planner = sparkSession.sessionState.planner

  def assertAnalyzed(): Unit = analyzed

  def assertSupported(): Unit = {
    if (sparkSession.sessionState.conf.isUnsupportedOperationCheckEnabled) {
      UnsupportedOperationChecker.checkForBatch(analyzed)
    }
  }

  lazy val analyzed: LogicalPlan = {
    SparkSession.setActiveSession(sparkSession)
    sparkSession.sessionState.analyzer.executeAndCheck(logical)
  }

  lazy val withCachedData: LogicalPlan = {
    assertAnalyzed()
    assertSupported()
    sparkSession.sharedState.cacheManager.useCachedData(analyzed)
  }

  lazy val optimizedPlan: LogicalPlan = {
    sparkSession.sessionState.optimizer.execute(withCachedData)
  }

  lazy val sparkPlan: SparkPlan = {
    SparkSession.setActiveSession(sparkSession)
    // TODO: We use next(), i.e. take the first plan returned by the planner, here for now,
    //       but we will implement to choose the best plan.
    planner.plan(ReturnAnswer(optimizedPlan)).next()
  }

  // executedPlan should not be used to initialize any SparkPlan. It should be
  // only used for execution.
  lazy val executedPlan: SparkPlan = prepareForExecution(sparkPlan)

  /** Internal version of the RDD. Avoids copies and has no schema */
  lazy val toRdd: RDD[InternalRow] = executedPlan.execute()

? 邏輯交代的非常清楚，從最后一行的 lazy val toRdd: RDD[InternalRow] = executedPlan.execute() 往前推便可清晰看到整個(gè)流程。細(xì)心的同學(xué)也可以看到所有步驟都是lazy的，只有調(diào)用了execute才會(huì)觸發(fā)執(zhí)行，這也是spark的重要設(shè)計(jì)思想。

? 后面的文章將對(duì)這些流程進(jìn)行詳細(xì)的介紹。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

spark sql 2.3 源碼解讀 - 架構(gòu)概覽 (1)

spark sql 2.3 源碼解讀 - 架構(gòu)概覽 (1)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

spark sql 2.3 源碼解讀 - 架構(gòu)概覽 (1)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av