源碼構(gòu)建簡化
很多人吐槽StreamingPro構(gòu)建實在太麻煩了。看源碼都難。然后花了一天時間做了比較大重構(gòu),這次只依賴于ServiceFramework項目。具體構(gòu)建方式如下:
git clone https://github.com/allwefantasy/ServiceFramework.git
cd ServiceFramework
mvn install -Pscala-2.11 -Pjetty-9 -Pweb-include-jetty-9
mvn install -Pscala-2.10 -Pjetty-9 -Pweb-include-jetty-9
//如果你需要切換scala版本,在構(gòu)建之前,記得運行下面的命令
./dev/change-version-to-2.10.sh
接著就可以構(gòu)建StreamingPro了:
git clone https://github.com/allwefantasy/streamingpro.git
// for spark 1.6.*
mvn -DskipTests clean package -pl streamingpro-spark -am -Ponline -Pscala-2.10 -Pcarbondata -Phive-thrift-server -Pspark-1.6.1 -Pshade
// for spark 2.*
mvn -DskipTests clean package -pl streamingpro-spark-2.0 -am -Ponline -Pscala-2.11 -Phive-thrift-server -Pspark-2.1.0 -Pshade
基于Spark 2.1.1 的StreamingPro 同時支持Spark Streaming 以及Structured Streaming
Structured Streaming 的支持參看文章:
StreamingPro 再次支持 Structured Streaming
Spark Streaming 則和Structure Streaming的形態(tài)一模一樣:
我們看具體的配置文件:
{
"scalamaptojson": {
"desc": "測試",
"strategy": "spark",
"algorithm": [],
"ref": [
],
"compositor": [
{
"name": "stream.sources",
"params": [
{
"format": "socket",
"outputTable": "test",
"port": "9999",
"host": "localhost",
"path": "-"
},
{
"format": "com.databricks.spark.csv",
"outputTable": "sample",
"header": "true",
"path": "/Users/allwefantasy/streamingpro/sample.csv"
}
]
},
{
"name": "stream.sql",
"params": [
{
"sql": "select city from test left join sample on test.content == sample.name",
"outputTableName": "test3"
}
]
},
{
"name": "stream.outputs",
"params": [
{
"mode": "Overwrite",
"format": "console",
"inputTableName": "test3",
"path": "-"
}
]
}
],
"configParams": {
}
}
}
只是把 ss 前綴換成了 stream。 啟動方式如下:
SHome=/Users/allwefantasy/streamingpro
./bin/spark-submit --class streaming.core.StreamingApp \
--master local[2] \
--name test \
$SHome/streamingpro-spark-2.0-0.4.15-SNAPSHOT.jar \
-streaming.name test \
-streaming.platform spark_streaming \
-streaming.job.file.path file://$SHome/spark-streaming.json