Spark 2.3.1測試筆記二:SortExec性能測試1

前言

本例基于1 Spark 2.3.0測試筆記一:Shuffle到胃疼 2 Spark 2.3.0測試筆記二:還能不能玩了? 3 Spark 2.3.1測試筆記一:問題依舊在? 的猜測 2.3.1 SortExec物理算子相對于2.1.2可能存在性能regression 進(jìn)行benchmark測試。

Test Code

class SortExecBenchmark extends BenchmarkBase {

  test("sort with one") {
    val N = 2 << 23
    runBenchmark("sort with one", N) {
      val df = sparkSession.range(N).selectExpr(s"-id * 2 as k1").sort("k1")
      assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[SortExec]).nonEmpty)
      df.count()
    }
  }

  test("sort with two") {
    val N = 2 << 23
    runBenchmark("sort with two", N) {
      val df = sparkSession.range(N)
        .selectExpr(s"-id * 2 as k1", "-id % 10000 as k2")
        .sort("k2", "k1")
      assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[SortExec]).nonEmpty)
      df.count()
    }
  }

  test("sort with three") {
    val N = 2 << 23
    runBenchmark("sort with three", N) {
      val df = sparkSession.range(N)
        .selectExpr(s"-id * 2 as k1", " -id % 100000 as k2", "-id % 10000 as k3")
        .sort("k3", "k2", "k1")
      assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[SortExec]).nonEmpty)
      df.count()
    }
  }

  test("merge join reversed") {
    val N = 2 << 21
    runBenchmark("merge join at the worst", N) {
      val df1 = sparkSession.range(N).selectExpr(s"-id * 2 as k1")
      val df2 = sparkSession.range(N).selectExpr(s"-id * 3 as k2")
      val df = df1.join(df2, col("k1") === col("k2"))
      assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[SortMergeJoinExec]).isDefined)
      df.count()
    }
  }

  test("merge join with duplicates reversed") {
    val N = 2 << 21
    runBenchmark("sort merge join", N) {
      val df1 = sparkSession.range(N)
        .selectExpr(s"-(id * 15485863) % ${N*10} as k1")
      val df2 = sparkSession.range(N)
        .selectExpr(s"-(id * 15485867) % ${N*10} as k2")
      df1.join(df2, col("k1") === col("k2")).count()
    }
  }

  override def runBenchmark(name: String, cardinality: Long)(f: => Unit): Unit = {
    val benchmark = new Benchmark(name, cardinality)

    benchmark.addCase(s"$name wholestage off", numIters = 2) { iter =>
      sparkSession.conf.set("spark.sql.codegen.wholeStage", value = false)
      f
    }

    benchmark.addCase(s"$name wholestage on", numIters = 3) { iter =>
      sparkSession.conf.set("spark.sql.codegen.wholeStage", value = true)
      f
    }

    benchmark.run()
  }
}

2.1.2 Benchmark records

[info] SortExecBenchmark:
Running benchmark: sort with one
  Running case: sort with one wholestage off
  Stopped after 2 iterations, 14683 ms
  Running case: sort with one wholestage on
  Stopped after 3 iterations, 18842 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz

sort with one:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
sort with one wholestage off                  6538 / 7342          2.6         389.7       1.0X
sort with one wholestage on                   6175 / 6281          2.7         368.1       1.1X

[info] - sort with one (54 seconds, 387 milliseconds)
Running benchmark: sort with two
  Running case: sort with two wholestage off
  Stopped after 2 iterations, 18571 ms
  Running case: sort with two wholestage on
  Stopped after 3 iterations, 26397 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz

sort with two:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
sort with two wholestage off                  9196 / 9286          1.8         548.1       1.0X
sort with two wholestage on                   8139 / 8799          2.1         485.1       1.1X

[info] - sort with two (1 minute, 4 seconds)
Running benchmark: sort with three
  Running case: sort with three wholestage off
  Stopped after 2 iterations, 28709 ms
  Running case: sort with three wholestage on
  Stopped after 3 iterations, 40878 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz

sort with three:                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
sort with three wholestage off              14038 / 14355          1.2         836.7       1.0X
sort with three wholestage on               13018 / 13626          1.3         775.9       1.1X

[info] - sort with three (1 minute, 37 seconds)
Running benchmark: merge join at the worst
  Running case: merge join at the worst wholestage off
  Stopped after 2 iterations, 7851 ms
  Running case: merge join at the worst wholestage on
  Stopped after 3 iterations, 11256 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz

merge join at the worst:                 Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
merge join at the worst wholestage off        3870 / 3926          1.1         922.6       1.0X
merge join at the worst wholestage on         3698 / 3752          1.1         881.7       1.0X

[info] - merge join reverted (27 seconds, 471 milliseconds)
Running benchmark: sort merge join
  Running case: sort merge join wholestage off
  Stopped after 2 iterations, 9358 ms
  Running case: sort merge join wholestage on
  Stopped after 3 iterations, 13661 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz

sort merge join:                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
sort merge join wholestage off                4617 / 4679          0.9        1100.7       1.0X
sort merge join wholestage on                 4306 / 4554          1.0        1026.7       1.1X

[info] - merge join with duplicates reverted (32 seconds, 826 milliseconds)

2.3.1 Benchmark records

[info] SortExecBenchmark:
Running benchmark: sort with one
  Running case: sort with one wholestage off
  Stopped after 2 iterations, 14670 ms
  Running case: sort with one wholestage on
  Stopped after 3 iterations, 18269 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz

sort with one:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
sort with one wholestage off                  6936 / 7335          2.4         413.4       1.0X
sort with one wholestage on                   6040 / 6090          2.8         360.0       1.1X

[info] - sort with one (54 seconds, 443 milliseconds)
Running benchmark: sort with two
  Running case: sort with two wholestage off
  Stopped after 2 iterations, 18748 ms
  Running case: sort with two wholestage on
  Stopped after 3 iterations, 25809 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz

sort with two:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
sort with two wholestage off                  9195 / 9374          1.8         548.0       1.0X
sort with two wholestage on                   8459 / 8603          2.0         504.2       1.1X

[info] - sort with two (1 minute, 4 seconds)
Running benchmark: sort with three
  Running case: sort with three wholestage off
  Stopped after 2 iterations, 28472 ms
  Running case: sort with three wholestage on
  Stopped after 3 iterations, 40225 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz

sort with three:                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
sort with three wholestage off              13708 / 14236          1.2         817.1       1.0X
sort with three wholestage on               13291 / 13408          1.3         792.2       1.0X

[info] - sort with three (1 minute, 36 seconds)
Running benchmark: merge join at the worst
  Running case: merge join at the worst wholestage off
  Stopped after 2 iterations, 7856 ms
  Running case: merge join at the worst wholestage on
  Stopped after 3 iterations, 10573 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz

merge join at the worst:                 Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
merge join at the worst wholestage off        3810 / 3928          1.1         908.4       1.0X
merge join at the worst wholestage on         3487 / 3525          1.2         831.4       1.1X

[info] - merge join reverted (26 seconds, 664 milliseconds)
Running benchmark: sort merge join
  Running case: sort merge join wholestage off
  Stopped after 2 iterations, 9118 ms
  Running case: sort merge join wholestage on
  Stopped after 3 iterations, 13825 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz

sort merge join:                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
sort merge join wholestage off                4450 / 4559          0.9        1061.0       1.0X
sort merge join wholestage on                 4395 / 4608          1.0        1047.9       1.0X

2.1.2 vs 2.3.1

version case Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
2.1.2 sort with one wholestage on 6175 / 6281 2.7 368.1 1.1X
2.3.1 sort with one wholestage on 6040 / 6090 2.8 360.0 1.1X
2.1.2 sort with two wholestage on 8139 / 8799 2.1 ** 485.1 ** 1.1X
2.3.1 sort with two wholestage on 8459 / 8603 2.0 504.2 1.1X
2.1.2 sort with three wholestage on 13018 / 13626 1.3 ** 775.9 ** 1.1X
2.3.1 sort with three wholestage on 13291 / 13408 1.3 792.2 1.0X

聲明

  1. Benchmark有一定的波動(dòng)性,也可能因計(jì)算機(jī)性能得到不同的結(jié)果
  2. 上面的數(shù)據(jù),取第三次test的結(jié)果,第一次由于sbt編譯會占用內(nèi)存,所以執(zhí)行killall java殺死所有java進(jìn)程,進(jìn)而第二次執(zhí)行“跑熱”JVM,最后記錄第三次結(jié)果
  3. case有點(diǎn)簡單,兩者的差異不是特別明顯,或許是對于spark那種類似alpha sort的排序方式對primitive類型影響不大
  4. 在全int場景下,2.1.2相比2.3.1略有優(yōu)勢,但微乎及微

結(jié)論

  1. 尚不能做任何結(jié)論,需下一步豐富下用例繼續(xù)測試復(fù)現(xiàn)
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容