Scala編寫批處理和流處理wordcount
這部分,我們在idea上使用Maven編寫Scala程序?qū)崿F(xiàn)批處理wordcount功能。
pow文件依賴配置
pow文件需要如下依賴,這里注意scala版本必須要和flink安裝包適配的版本一致:
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-scala_2.11</artifactId>
<version>1.10.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_2.11</artifactId>
<version>1.10.1</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>4.4.0</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>3.3.0</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
批處理wordcount
之后我們準(zhǔn)備好輸入文本,并編寫相應(yīng)Scala代碼:

hello.txt文本如下:
hello world
hello flink
hello scala
how are you
fine thank you
and you
scala文件WordCount腳本代碼如下:
package com.example.wc
import org.apache.flink.api.scala.ExecutionEnvironment
import org.apache.flink.api.scala._
//批處理的word count
object WordCount {
def main(args: Array[String]): Unit = {
//創(chuàng)建一個批處理
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
//從文件中讀取數(shù)據(jù)
val inputPath:String = "/Users/wenhuan/IdeaProjects/FlinkTutorial/src/main/resources/hello.txt"
val inputDataSet:DataSet[String] = env.readTextFile(inputPath)
//對數(shù)據(jù)進(jìn)行轉(zhuǎn)換處理統(tǒng)計(jì),先分詞
val resultDataSet:DataSet[(String,Int)] = inputDataSet
.flatMap(_.split(" "))
.map((_,1))
.groupBy(0)
.sum(1)
//打印輸出
resultDataSet.print()
}
}
流處理wordcount
scala文件WordCount腳本代碼如下:
package com.example.wc
import org.apache.flink.api.java.utils.ParameterTool
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.streaming.api.scala._
// 流式處理word count
object StreamWordCount {
def main(args: Array[String]): Unit = {
//創(chuàng)建流處理的執(zhí)行環(huán)境
val env:StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
//設(shè)置并行度
//env.setParallelism(2)
//從外部命令中提取參數(shù),作為socket主機(jī)名和端口號
val paramTool:ParameterTool = ParameterTool.fromArgs(args)
val host:String = paramTool.get("host")
val port:Int = paramTool.getInt("port")
//接收一個socket文本流
val inputDataStream:DataStream[String] = env.socketTextStream(host,port)
//進(jìn)行轉(zhuǎn)化處理統(tǒng)計(jì)
val resultDataStream:DataStream[(String,Int)] = inputDataStream
.flatMap(x => x.split(" "))
.filter(_.nonEmpty)
.map((_,1))
.keyBy(0)
.sum(1)
resultDataStream.print()
//啟動任務(wù)執(zhí)行
env.execute("stream word count")
}
}
idea運(yùn)行時需要配置Program arguments如下:
--host localhost --port 7777
開啟本地nc -lk服務(wù):
nc -lk 7777
然后執(zhí)行上述代碼即可。
任務(wù)提交執(zhí)行
flink任務(wù)提交執(zhí)行有兩種方式,第一種是在網(wǎng)頁直接提交,另一種是命令行方式。
命令行方式
提交任務(wù)命令如下:
./bin/flink run -c com.example.wc.StreamWordCount -p 1 ./FlinkTutorial-1.0-SNAPSHOT-jar-with-dependencies.jar --host localhost --port 7777
查看當(dāng)前正在運(yùn)行任務(wù)命令如下:
./bin/flink list
停止正在運(yùn)行任務(wù)的命令如下:
./bin/flink cancel <jobid>