spark任務(wù)提交

好久時間沒有用spark了,最近工作需要,在提交spark任務(wù)的時候發(fā)現(xiàn)打jar一直很大,自己又是搞C++的,以前打jar的時候按照網(wǎng)上的提示,都是用的是在IDEA里面 File -> Project Structure -> Artifacts,有時候發(fā)現(xiàn)會出現(xiàn)錯誤,什么META-INF 問題啥的,解決了,發(fā)現(xiàn)打出來的jar很大,90M-160M 不等。
今天運(yùn)行任務(wù),打成jar包的時候,直接用maven的package,(我這玩c++的有點(diǎn)out了,打個包都不會,菜的一筆,哎。。。 還是記錄一下)


然后發(fā)現(xiàn)生成的有對應(yīng)的tar.gz ,解壓里面有對應(yīng)的lib,lib里面都是jar,然后提交任務(wù)的時候,就把jar全部帶上吧,所以在提交任務(wù)的時候加上一個 --jars參數(shù)

--jars ./lib/fastjson-1.2.39.jar,./lib/kafka-clients-0.10.0.1.jar,./lib/profiler-4.0.5.jar,./lib/sdk-2.3.jar,./lib/spark-core_2.11-2.1.0.jar,./lib/spark-hive_2.11-2.1.0.jar,./lib/spark-streaming_2.11-2.1.0.jar,./lib/spark-streaming-kafka-0-10_2.11-2.1.0.jar

然后發(fā)現(xiàn)總共才20M而已,以后就這樣干吧,每個jar包之間通過逗號連接,逗號兩邊不要有空格,還有就是這么多jar肯定不能直接一個個把名字打上去,寫個shell腳本就OK了
最終在driver端有兩個文件,一個是lib(里面就是各種依賴的jar),一個是自己package的jar(可以通過maven打包的jar,幾十KB),然后用spark-submit提交吧

出現(xiàn)一個問題

18/09/03 15:18:02 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on BJHTYD-Hope-27-34.hadoop..local:7949 (size: 2.2 KB, free: 2.8 GB)
18/09/03 15:18:03 WARN TaskSetManager: Lost task 7.0 in stage 0.0 (TID 1, BJHTYD-Hope-26-3.hadoop.local, executor 2): java.lang.NoClassDefFoundError: Could not initialize class.HbaseConnectionPool
    at .JavaReceiver.buildHbaseClient(JavaReceiver.java:149)
    at .JavaReceiver$1$1.call(JavaReceiver.java:75)
    at .JavaReceiver$1$1.call(JavaReceiver.java:69)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:219)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:219)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1955)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1955)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:99)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

18/09/03 15:18:03 INFO TaskSetManager: Lost task 1.0 in stage 0.0 (TID 4) on BJHTYD-Hope-53-71.hadoop..local, executor 1: java.lang.NoClassDefFoundError (Could not initialize class Service.HbaseConnectionPool) [duplicate 1]
18/09/03 15:18:03 WARN TaskSetManager: Lost task 4.0 in stage 0.0 (TID 9, BJHTYD-Hope-53-71.hadoop.local, executor 1): java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/CellScannable
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at Service.HbaseConnectionPool.<clinit>(HbaseConnectionPool.java:27)
    at JavaReceiver.buildHbaseClient(JavaReceiver.java:149)
    at JavaReceiver$1$1.call(JavaReceiver.java:75)
    at Receiver$1$1.call(JavaReceiver.java:69)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:219)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:219)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1955)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1955)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:99)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.CellScannable

看了一下提交,發(fā)現(xiàn)沒有對應(yīng)hbase的jar,然后把對應(yīng)的jar包添加到lib中并在--jars中添加,然后提交后,成功

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容