一. 狀態(tài)概述:
Flink中的狀態(tài):
- 算子狀態(tài)(Operator State)
- 鍵控狀態(tài)(Keyed State)
- 狀態(tài)后端(State Backends)

由一個(gè)任務(wù)維護(hù),并且用來(lái)計(jì)算某個(gè)結(jié)果的所有數(shù)據(jù),都屬于這個(gè)任務(wù)的狀態(tài)
可以認(rèn)為任務(wù)狀態(tài)就是一個(gè)本地變量,可以被任務(wù)的業(yè)務(wù)邏輯訪問(wèn)
Flink 會(huì)進(jìn)行狀態(tài)管理,包括狀態(tài)一致性、故障處理以及高效存儲(chǔ)和訪問(wèn),以便于開(kāi)發(fā)人員可以專注于應(yīng)用程序的邏輯
在Flink中,狀態(tài)始終與特定算子相關(guān)聯(lián)
為了使運(yùn)行時(shí)的Flink了解算子的狀態(tài),算子需要預(yù)先注冊(cè)其狀態(tài)
總的來(lái)說(shuō),有兩種類型的狀態(tài):
- 算子狀態(tài)(Operator State)
1)算子狀態(tài)的作用范圍限定為算子任務(wù)(也就是不能跨任務(wù)訪問(wèn)) - 鍵控狀態(tài)(Keyed State)
1)根據(jù)輸入數(shù)據(jù)流中定義的鍵(key)來(lái)維護(hù)和訪問(wèn)
二. 算子狀態(tài) Operator State
2.1 概述

算子狀態(tài)的作用范圍限定為算子任務(wù),同一并行任務(wù)所處理的所有數(shù)據(jù)都可以訪問(wèn)到相同的狀態(tài)。
狀態(tài)對(duì)于同一任務(wù)而言是共享的。(不能跨slot)
狀態(tài)算子不能由相同或不同算子的另一個(gè)任務(wù)訪問(wèn)。
2.2 算子狀態(tài)數(shù)據(jù)結(jié)構(gòu)
列表狀態(tài)(List state)
1)將狀態(tài)表示為一組數(shù)據(jù)的列表聯(lián)合列表狀態(tài)(Union list state)
1)也將狀態(tài)表示未數(shù)據(jù)的列表。它與常規(guī)列表狀態(tài)的區(qū)別在于,在發(fā)生故障時(shí),或者從保存點(diǎn)(savepoint)啟動(dòng)應(yīng)用程序時(shí)如何恢復(fù)
3)廣播狀態(tài)(Broadcast state)
1)如果一個(gè)算子有多項(xiàng)任務(wù),而它的每項(xiàng)任務(wù)狀態(tài)又都相同,那么這種特殊情況最適合應(yīng)用廣播狀態(tài)
2.3 代碼測(cè)試
實(shí)際一般用算子狀態(tài)比較少,一般還是鍵控狀態(tài)用得多一點(diǎn)。
代碼:
package org.flink.state;
import org.flink.beans.SensorReading;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.streaming.api.checkpoint.ListCheckpointed;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import java.util.Collections;
import java.util.List;
/**
* @author 只是甲
* @date 2021-09-17
* @remark 算子狀態(tài)測(cè)試
*/
public class StateTest1_OperatorState {
public static void main(String[] args) throws Exception{
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
// socket文本流
DataStream<String> inputStream = env.socketTextStream("10.31.1.122", 7777);
// 轉(zhuǎn)換成SensorReading類型
DataStream<SensorReading> dataStream = inputStream.map(line -> {
String[] fields = line.split(",");
return new SensorReading(fields[0], new Long(fields[1]), new Double(fields[2]));
});
// 定義一個(gè)有狀態(tài)的map操作,統(tǒng)計(jì)當(dāng)前分區(qū)數(shù)據(jù)個(gè)數(shù)
SingleOutputStreamOperator<Integer> resultStream = dataStream.map(new MyCountMapper());
resultStream.print();
env.execute();
}
// 自定義MapFunction
public static class MyCountMapper implements MapFunction<SensorReading, Integer>, ListCheckpointed<Integer>{
// 定義一個(gè)本地變量,作為算子狀態(tài)
private Integer count = 0;
@Override
public Integer map(SensorReading value) throws Exception {
count++;
return count;
}
@Override
public List<Integer> snapshotState(long checkpointId, long timestamp) throws Exception {
return Collections.singletonList(count);
}
@Override
public void restoreState(List<Integer> state) throws Exception {
for( Integer num: state )
count += num;
}
}
}
輸入:

輸出:

三. 鍵控狀態(tài) Keyed State
3.1 概述

鍵控狀態(tài)是根據(jù)輸入數(shù)據(jù)流中定義的鍵(key)來(lái)維護(hù)和訪問(wèn)的。
Flink 為每個(gè)key維護(hù)一個(gè)狀態(tài)實(shí)例,并將具有相同鍵的所有數(shù)據(jù),都分區(qū)到同一個(gè)算子任務(wù)中,這個(gè)任務(wù)會(huì)維護(hù)和處理這個(gè)key對(duì)應(yīng)的狀態(tài)。
當(dāng)任務(wù)處理一條數(shù)據(jù)時(shí),他會(huì)自動(dòng)將狀態(tài)的訪問(wèn)范圍限定為當(dāng)前數(shù)據(jù)的key。
3.2 鍵控狀態(tài)數(shù)據(jù)結(jié)構(gòu)
值狀態(tài)(value state)
將狀態(tài)表示為單個(gè)的值列表狀態(tài)(List state)
將狀態(tài)表示為一組數(shù)據(jù)的列表映射狀態(tài)(Map state)
將狀態(tài)表示為一組key-value對(duì)聚合狀態(tài)(Reducing state & Aggregating State)
將狀態(tài)表示為一個(gè)用于聚合操作的列表
3.3 測(cè)試代碼

代碼:
package org.flink.state;
import org.flink.beans.SensorReading;
import org.apache.flink.api.common.functions.RichMapFunction;
import org.apache.flink.api.common.state.*;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
/**
* @author 只是甲
* @date 2021-09-17
* @remark 鍵控狀態(tài)測(cè)試
*/
public class StateTest2_KeyedState {
public static void main(String[] args) throws Exception{
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
// socket文本流
DataStream<String> inputStream = env.socketTextStream("10.31.1.122", 7777);
// 轉(zhuǎn)換成SensorReading類型
DataStream<SensorReading> dataStream = inputStream.map(line -> {
String[] fields = line.split(",");
return new SensorReading(fields[0], new Long(fields[1]), new Double(fields[2]));
});
// 定義一個(gè)有狀態(tài)的map操作,統(tǒng)計(jì)當(dāng)前sensor數(shù)據(jù)個(gè)數(shù)
SingleOutputStreamOperator<Integer> resultStream = dataStream
.keyBy("id")
.map( new MyKeyCountMapper() );
resultStream.print();
env.execute();
}
// 自定義RichMapFunction
public static class MyKeyCountMapper extends RichMapFunction<SensorReading, Integer>{
private ValueState<Integer> keyCountState;
// 其它類型狀態(tài)的聲明
private ListState<String> myListState;
private MapState<String, Double> myMapState;
private ReducingState<SensorReading> myReducingState;
@Override
public void open(Configuration parameters) throws Exception {
keyCountState = getRuntimeContext().getState(new ValueStateDescriptor<Integer>("key-count", Integer.class, 0));
myListState = getRuntimeContext().getListState(new ListStateDescriptor<String>("my-list", String.class));
myMapState = getRuntimeContext().getMapState(new MapStateDescriptor<String, Double>("my-map", String.class, Double.class));
// myReducingState = getRuntimeContext().getReducingState(new ReducingStateDescriptor<SensorReading>())
}
@Override
public Integer map(SensorReading value) throws Exception {
// 其它狀態(tài)API調(diào)用
// list state
for(String str: myListState.get()){
System.out.println(str);
}
myListState.add("hello");
// map state
myMapState.get("1");
myMapState.put("2", 12.3);
myMapState.remove("2");
// reducing state
// myReducingState.add(value);
myMapState.clear();
Integer count = keyCountState.value();
count++;
keyCountState.update(count);
return count;
}
}
}
輸入:

輸出:

3.4 場(chǎng)景測(cè)試
假設(shè)做一個(gè)溫度報(bào)警,如果一個(gè)傳感器前后溫差超過(guò)10度就報(bào)警。這里使用鍵控狀態(tài)Keyed State + flatMap來(lái)實(shí)現(xiàn)
代碼:
package org.flink.state;
import org.flink.beans.SensorReading;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.common.functions.RichFlatMapFunction;
import org.apache.flink.api.common.functions.RichMapFunction;
import org.apache.flink.api.common.state.ValueState;
import org.apache.flink.api.common.state.ValueStateDescriptor;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;
/**
* @author 只是甲
* @date 2021-09-17
* @remark 鍵控狀態(tài)-溫度預(yù)警
*/
public class StateTest3_KeyedStateApplicationCase {
public static void main(String[] args) throws Exception{
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
// socket文本流
DataStream<String> inputStream = env.socketTextStream("10.31.1.122", 7777);
// 轉(zhuǎn)換成SensorReading類型
DataStream<SensorReading> dataStream = inputStream.map(line -> {
String[] fields = line.split(",");
return new SensorReading(fields[0], new Long(fields[1]), new Double(fields[2]));
});
// 定義一個(gè)flatmap操作,檢測(cè)溫度跳變,輸出報(bào)警
SingleOutputStreamOperator<Tuple3<String, Double, Double>> resultStream = dataStream.keyBy("id")
.flatMap(new TempChangeWarning(10.0));
resultStream.print();
env.execute();
}
// 實(shí)現(xiàn)自定義函數(shù)類
public static class TempChangeWarning extends RichFlatMapFunction<SensorReading, Tuple3<String, Double, Double>>{
// 私有屬性,溫度跳變閾值
private Double threshold;
public TempChangeWarning(Double threshold) {
this.threshold = threshold;
}
// 定義狀態(tài),保存上一次的溫度值
private ValueState<Double> lastTempState;
@Override
public void open(Configuration parameters) throws Exception {
lastTempState = getRuntimeContext().getState(new ValueStateDescriptor<Double>("last-temp", Double.class));
}
@Override
public void flatMap(SensorReading value, Collector<Tuple3<String, Double, Double>> out) throws Exception {
// 獲取狀態(tài)
Double lastTemp = lastTempState.value();
// 如果狀態(tài)不為null,那么就判斷兩次溫度差值
if( lastTemp != null ){
Double diff = Math.abs( value.getTemperature() - lastTemp );
if( diff >= threshold )
out.collect(new Tuple3<>(value.getId(), lastTemp, value.getTemperature()));
}
// 更新?tīng)顟B(tài)
lastTempState.update(value.getTemperature());
}
@Override
public void close() throws Exception {
lastTempState.clear();
}
}
}
輸入:
sensor_1,1547718199,35.8
sensor_1,1547718199,32.4
sensor_1,1547718199,42.4
sensor_10,1547718205,52.6
sensor_10,1547718205,22.5
sensor_7,1547718202,6.7
sensor_7,1547718202,9.9
sensor_1,1547718207,36.3
sensor_7,1547718202,19.9
sensor_7,1547718202,30

輸出:
中間沒(méi)有輸出(sensor_7,9.9,19.9),應(yīng)該是double浮點(diǎn)數(shù)計(jì)算精度問(wèn)題,不管它

四. 狀態(tài)后端 State Backends
4.1 概述
每傳入一條數(shù)據(jù),有狀態(tài)的算子任務(wù)都會(huì)讀取和更新?tīng)顟B(tài)。
由于有效的狀態(tài)訪問(wèn)對(duì)于處理數(shù)據(jù)的低延遲至關(guān)重要,因此每個(gè)并行任務(wù)都會(huì)在本地維護(hù)其狀態(tài),以確保快速的狀態(tài)訪問(wèn)。
狀態(tài)的存儲(chǔ)、訪問(wèn)以及維護(hù),由一個(gè)可插入的組件決定,這個(gè)組件就叫做狀態(tài)后端( state backend)
狀態(tài)后端主要負(fù)責(zé)兩件事:本地狀態(tài)管理,以及將檢查點(diǎn)(checkPoint)狀態(tài)寫(xiě)入遠(yuǎn)程存儲(chǔ)
4.2 選擇一個(gè)狀態(tài)后端
MemoryStateBackend
內(nèi)存級(jí)的狀態(tài)后端,會(huì)將鍵控狀態(tài)作為內(nèi)存中的對(duì)象進(jìn)行管理,將它們存儲(chǔ)在TaskManager的JVM堆上,而將checkpoint存儲(chǔ)在JobManager的內(nèi)存中
特點(diǎn):快速、低延遲,但不穩(wěn)定FsStateBackend(默認(rèn))
將checkpoint存到遠(yuǎn)程的持久化文件系統(tǒng)(FileSystem)上,而對(duì)于本地狀態(tài),跟MemoryStateBackend一樣,也會(huì)存在TaskManager的JVM堆上
同時(shí)擁有內(nèi)存級(jí)的本地訪問(wèn)速度,和更好的容錯(cuò)保證RocksDBStateBackend
將所有狀態(tài)序列化后,存入本地的RocksDB中存儲(chǔ)
4.3 配置文件
flink-conf.yaml
#==============================================================================
# Fault tolerance and checkpointing
#==============================================================================
# The backend that will be used to store operator state checkpoints if
# checkpointing is enabled.
#
# Supported backends are 'jobmanager', 'filesystem', 'rocksdb', or the
# <class-name-of-factory>.
#
# state.backend: filesystem
上面這個(gè)就是默認(rèn)的checkpoint存在filesystem
# Directory for checkpoints filesystem, when using any of the default bundled
# state backends.
#
# state.checkpoints.dir: hdfs://namenode-host:port/flink-checkpoints
# Default target directory for savepoints, optional.
#
# state.savepoints.dir: hdfs://namenode-host:port/flink-savepoints
# Flag to enable/disable incremental checkpoints for backends that
# support incremental checkpoints (like the RocksDB state backend).
#
# state.backend.incremental: false
# The failover strategy, i.e., how the job computation recovers from task failures.
# Only restart tasks that may have been affected by the task failure, which typically includes
# downstream tasks and potentially upstream tasks if their produced data is no longer available for consumption.
jobmanager.execution.failover-strategy: region
上面這個(gè)region指,多個(gè)并行度的任務(wù)要是有個(gè)掛掉了,只重啟那個(gè)任務(wù)所屬的region(可能含有多個(gè)子任務(wù)),而不需要重啟整個(gè)Flink程序
4.4 樣例代碼
其中使用RocksDBStateBackend需要另外加入pom依賴
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-statebackend-rocksdb_2.11</artifactId>
<version>1.9.0</version>
</dependency>
代碼:
package org.flink.state;
import org.flink.beans.SensorReading;
import org.apache.flink.api.common.restartstrategy.RestartStrategies;
import org.apache.flink.api.common.time.Time;
import org.apache.flink.contrib.streaming.state.RocksDBStateBackend;
import org.apache.flink.runtime.state.filesystem.FsStateBackend;
import org.apache.flink.runtime.state.memory.MemoryStateBackend;
import org.apache.flink.streaming.api.CheckpointingMode;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
/**
* @author 只是甲
* @date 2021-09-17
* @remark 狀態(tài)后端測(cè)試
*/
public class StateTest4_FaultTolerance {
public static void main(String[] args) throws Exception{
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
// 1. 狀態(tài)后端配置
env.setStateBackend( new MemoryStateBackend());
env.setStateBackend( new FsStateBackend(""));
env.setStateBackend( new RocksDBStateBackend(""));
// 2. 檢查點(diǎn)配置
env.enableCheckpointing(300);
// 高級(jí)選項(xiàng)
env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
env.getCheckpointConfig().setCheckpointTimeout(60000L);
env.getCheckpointConfig().setMaxConcurrentCheckpoints(2);
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(100L);
env.getCheckpointConfig().setPreferCheckpointForRecovery(true);
env.getCheckpointConfig().setTolerableCheckpointFailureNumber(0);
// 3. 重啟策略配置
// 固定延遲重啟
env.setRestartStrategy(RestartStrategies.fixedDelayRestart(3, 10000L));
// 失敗率重啟
env.setRestartStrategy(RestartStrategies.failureRateRestart(3, Time.minutes(10), Time.minutes(1)));
// socket文本流
DataStream<String> inputStream = env.socketTextStream("10.31.1.122", 7777);
// 轉(zhuǎn)換成SensorReading類型
DataStream<SensorReading> dataStream = inputStream.map(line -> {
String[] fields = line.split(",");
return new SensorReading(fields[0], new Long(fields[1]), new Double(fields[2]));
});
dataStream.print();
env.execute();
}
}