好久時間沒有用spark了,最近工作需要,在提交spark任務(wù)的時候發(fā)現(xiàn)打jar一直很大,自己又是搞C++的,以前打jar的時候按照網(wǎng)上的提示,都是用的是在IDEA里面 File -> Project Structure -> Artifacts,有時候發(fā)現(xiàn)會出現(xiàn)錯誤,什么META-INF 問題啥的,解決了,發(fā)現(xiàn)打出來的jar很大,90M-160M 不等。
今天運(yùn)行任務(wù),打成jar包的時候,直接用maven的package,(我這玩c++的有點(diǎn)out了,打個包都不會,菜的一筆,哎。。。 還是記錄一下)

然后發(fā)現(xiàn)生成的有對應(yīng)的tar.gz ,解壓里面有對應(yīng)的lib,lib里面都是jar,然后提交任務(wù)的時候,就把jar全部帶上吧,所以在提交任務(wù)的時候加上一個 --jars參數(shù)
--jars ./lib/fastjson-1.2.39.jar,./lib/kafka-clients-0.10.0.1.jar,./lib/profiler-4.0.5.jar,./lib/sdk-2.3.jar,./lib/spark-core_2.11-2.1.0.jar,./lib/spark-hive_2.11-2.1.0.jar,./lib/spark-streaming_2.11-2.1.0.jar,./lib/spark-streaming-kafka-0-10_2.11-2.1.0.jar
然后發(fā)現(xiàn)總共才20M而已,以后就這樣干吧,每個jar包之間通過逗號連接,逗號兩邊不要有空格,還有就是這么多jar肯定不能直接一個個把名字打上去,寫個shell腳本就OK了
最終在driver端有兩個文件,一個是lib(里面就是各種依賴的jar),一個是自己package的jar(可以通過maven打包的jar,幾十KB),然后用spark-submit提交吧
出現(xiàn)一個問題
18/09/03 15:18:02 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on BJHTYD-Hope-27-34.hadoop..local:7949 (size: 2.2 KB, free: 2.8 GB)
18/09/03 15:18:03 WARN TaskSetManager: Lost task 7.0 in stage 0.0 (TID 1, BJHTYD-Hope-26-3.hadoop.local, executor 2): java.lang.NoClassDefFoundError: Could not initialize class.HbaseConnectionPool
at .JavaReceiver.buildHbaseClient(JavaReceiver.java:149)
at .JavaReceiver$1$1.call(JavaReceiver.java:75)
at .JavaReceiver$1$1.call(JavaReceiver.java:69)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:219)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1955)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1955)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
18/09/03 15:18:03 INFO TaskSetManager: Lost task 1.0 in stage 0.0 (TID 4) on BJHTYD-Hope-53-71.hadoop..local, executor 1: java.lang.NoClassDefFoundError (Could not initialize class Service.HbaseConnectionPool) [duplicate 1]
18/09/03 15:18:03 WARN TaskSetManager: Lost task 4.0 in stage 0.0 (TID 9, BJHTYD-Hope-53-71.hadoop.local, executor 1): java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/CellScannable
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at Service.HbaseConnectionPool.<clinit>(HbaseConnectionPool.java:27)
at JavaReceiver.buildHbaseClient(JavaReceiver.java:149)
at JavaReceiver$1$1.call(JavaReceiver.java:75)
at Receiver$1$1.call(JavaReceiver.java:69)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:219)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1955)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1955)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.CellScannable
看了一下提交,發(fā)現(xiàn)沒有對應(yīng)hbase的jar,然后把對應(yīng)的jar包添加到lib中并在--jars中添加,然后提交后,成功