Flink基礎(chǔ)系列26-Flink狀態(tài)管理

一. 狀態(tài)概述:

Flink中的狀態(tài):

  1. 算子狀態(tài)(Operator State)
  2. 鍵控狀態(tài)(Keyed State)
  3. 狀態(tài)后端(State Backends)
image.png

由一個(gè)任務(wù)維護(hù),并且用來(lái)計(jì)算某個(gè)結(jié)果的所有數(shù)據(jù),都屬于這個(gè)任務(wù)的狀態(tài)

可以認(rèn)為任務(wù)狀態(tài)就是一個(gè)本地變量,可以被任務(wù)的業(yè)務(wù)邏輯訪問(wèn)

Flink 會(huì)進(jìn)行狀態(tài)管理,包括狀態(tài)一致性、故障處理以及高效存儲(chǔ)和訪問(wèn),以便于開(kāi)發(fā)人員可以專注于應(yīng)用程序的邏輯

在Flink中,狀態(tài)始終與特定算子相關(guān)聯(lián)
為了使運(yùn)行時(shí)的Flink了解算子的狀態(tài),算子需要預(yù)先注冊(cè)其狀態(tài)

總的來(lái)說(shuō),有兩種類型的狀態(tài):

  1. 算子狀態(tài)(Operator State)
    1)算子狀態(tài)的作用范圍限定為算子任務(wù)(也就是不能跨任務(wù)訪問(wèn))
  2. 鍵控狀態(tài)(Keyed State)
    1)根據(jù)輸入數(shù)據(jù)流中定義的鍵(key)來(lái)維護(hù)和訪問(wèn)

二. 算子狀態(tài) Operator State

2.1 概述

image.png

算子狀態(tài)的作用范圍限定為算子任務(wù),同一并行任務(wù)所處理的所有數(shù)據(jù)都可以訪問(wèn)到相同的狀態(tài)。

狀態(tài)對(duì)于同一任務(wù)而言是共享的。(不能跨slot)

狀態(tài)算子不能由相同或不同算子的另一個(gè)任務(wù)訪問(wèn)。

2.2 算子狀態(tài)數(shù)據(jù)結(jié)構(gòu)

  1. 列表狀態(tài)(List state)
    1)將狀態(tài)表示為一組數(shù)據(jù)的列表

  2. 聯(lián)合列表狀態(tài)(Union list state)
    1)也將狀態(tài)表示未數(shù)據(jù)的列表。它與常規(guī)列表狀態(tài)的區(qū)別在于,在發(fā)生故障時(shí),或者從保存點(diǎn)(savepoint)啟動(dòng)應(yīng)用程序時(shí)如何恢復(fù)

3)廣播狀態(tài)(Broadcast state)
1)如果一個(gè)算子有多項(xiàng)任務(wù),而它的每項(xiàng)任務(wù)狀態(tài)又都相同,那么這種特殊情況最適合應(yīng)用廣播狀態(tài)

2.3 代碼測(cè)試

實(shí)際一般用算子狀態(tài)比較少,一般還是鍵控狀態(tài)用得多一點(diǎn)。

代碼:

package org.flink.state;

import org.flink.beans.SensorReading;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.streaming.api.checkpoint.ListCheckpointed;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

import java.util.Collections;
import java.util.List;

/**
 * @author      只是甲
 * @date        2021-09-17
 * @remark      算子狀態(tài)測(cè)試
 */
public class StateTest1_OperatorState {
    public static void main(String[] args) throws Exception{
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        // socket文本流
        DataStream<String> inputStream = env.socketTextStream("10.31.1.122", 7777);

        // 轉(zhuǎn)換成SensorReading類型
        DataStream<SensorReading> dataStream = inputStream.map(line -> {
            String[] fields = line.split(",");
            return new SensorReading(fields[0], new Long(fields[1]), new Double(fields[2]));
        });

        // 定義一個(gè)有狀態(tài)的map操作,統(tǒng)計(jì)當(dāng)前分區(qū)數(shù)據(jù)個(gè)數(shù)
        SingleOutputStreamOperator<Integer> resultStream = dataStream.map(new MyCountMapper());

        resultStream.print();

        env.execute();
    }

    // 自定義MapFunction
    public static class MyCountMapper implements MapFunction<SensorReading, Integer>, ListCheckpointed<Integer>{
        // 定義一個(gè)本地變量,作為算子狀態(tài)
        private Integer count = 0;

        @Override
        public Integer map(SensorReading value) throws Exception {
            count++;
            return count;
        }

        @Override
        public List<Integer> snapshotState(long checkpointId, long timestamp) throws Exception {
            return Collections.singletonList(count);
        }

        @Override
        public void restoreState(List<Integer> state) throws Exception {
            for( Integer num: state )
                count += num;
        }
    }
}

輸入:

image.png

輸出:

image.png

三. 鍵控狀態(tài) Keyed State

3.1 概述

image.png

鍵控狀態(tài)是根據(jù)輸入數(shù)據(jù)流中定義的鍵(key)來(lái)維護(hù)和訪問(wèn)的。

Flink 為每個(gè)key維護(hù)一個(gè)狀態(tài)實(shí)例,并將具有相同鍵的所有數(shù)據(jù),都分區(qū)到同一個(gè)算子任務(wù)中,這個(gè)任務(wù)會(huì)維護(hù)和處理這個(gè)key對(duì)應(yīng)的狀態(tài)。

當(dāng)任務(wù)處理一條數(shù)據(jù)時(shí),他會(huì)自動(dòng)將狀態(tài)的訪問(wèn)范圍限定為當(dāng)前數(shù)據(jù)的key。

3.2 鍵控狀態(tài)數(shù)據(jù)結(jié)構(gòu)

  1. 值狀態(tài)(value state)
    將狀態(tài)表示為單個(gè)的值

  2. 列表狀態(tài)(List state)
    將狀態(tài)表示為一組數(shù)據(jù)的列表

  3. 映射狀態(tài)(Map state)
    將狀態(tài)表示為一組key-value對(duì)

  4. 聚合狀態(tài)(Reducing state & Aggregating State)
    將狀態(tài)表示為一個(gè)用于聚合操作的列表

3.3 測(cè)試代碼

image.png

代碼:

package org.flink.state;

import org.flink.beans.SensorReading;
import org.apache.flink.api.common.functions.RichMapFunction;
import org.apache.flink.api.common.state.*;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

/**
 * @author      只是甲
 * @date        2021-09-17
 * @remark      鍵控狀態(tài)測(cè)試
 */
public class StateTest2_KeyedState {
    public static void main(String[] args) throws Exception{
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        // socket文本流
        DataStream<String> inputStream = env.socketTextStream("10.31.1.122", 7777);

        // 轉(zhuǎn)換成SensorReading類型
        DataStream<SensorReading> dataStream = inputStream.map(line -> {
            String[] fields = line.split(",");
            return new SensorReading(fields[0], new Long(fields[1]), new Double(fields[2]));
        });

        // 定義一個(gè)有狀態(tài)的map操作,統(tǒng)計(jì)當(dāng)前sensor數(shù)據(jù)個(gè)數(shù)
        SingleOutputStreamOperator<Integer> resultStream = dataStream
                .keyBy("id")
                .map( new MyKeyCountMapper() );

        resultStream.print();

        env.execute();
    }

    // 自定義RichMapFunction
    public static class MyKeyCountMapper extends RichMapFunction<SensorReading, Integer>{
        private ValueState<Integer> keyCountState;

        // 其它類型狀態(tài)的聲明
        private ListState<String> myListState;
        private MapState<String, Double> myMapState;
        private ReducingState<SensorReading> myReducingState;

        @Override
        public void open(Configuration parameters) throws Exception {
            keyCountState = getRuntimeContext().getState(new ValueStateDescriptor<Integer>("key-count", Integer.class, 0));

            myListState = getRuntimeContext().getListState(new ListStateDescriptor<String>("my-list", String.class));
            myMapState = getRuntimeContext().getMapState(new MapStateDescriptor<String, Double>("my-map", String.class, Double.class));
//            myReducingState = getRuntimeContext().getReducingState(new ReducingStateDescriptor<SensorReading>())
        }

        @Override
        public Integer map(SensorReading value) throws Exception {
            // 其它狀態(tài)API調(diào)用
            // list state
            for(String str: myListState.get()){
                System.out.println(str);
            }
            myListState.add("hello");
            // map state
            myMapState.get("1");
            myMapState.put("2", 12.3);
            myMapState.remove("2");
            // reducing state
//            myReducingState.add(value);

            myMapState.clear();

            Integer count = keyCountState.value();
            count++;
            keyCountState.update(count);
            return count;
        }
    }
}

輸入:

image.png

輸出:

image.png

3.4 場(chǎng)景測(cè)試

假設(shè)做一個(gè)溫度報(bào)警,如果一個(gè)傳感器前后溫差超過(guò)10度就報(bào)警。這里使用鍵控狀態(tài)Keyed State + flatMap來(lái)實(shí)現(xiàn)

代碼:

package org.flink.state;

import org.flink.beans.SensorReading;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.common.functions.RichFlatMapFunction;
import org.apache.flink.api.common.functions.RichMapFunction;
import org.apache.flink.api.common.state.ValueState;
import org.apache.flink.api.common.state.ValueStateDescriptor;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;

/**
 * @author      只是甲
 * @date        2021-09-17
 * @remark      鍵控狀態(tài)-溫度預(yù)警
 */
public class StateTest3_KeyedStateApplicationCase {
    public static void main(String[] args) throws Exception{
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        // socket文本流
        DataStream<String> inputStream = env.socketTextStream("10.31.1.122", 7777);

        // 轉(zhuǎn)換成SensorReading類型
        DataStream<SensorReading> dataStream = inputStream.map(line -> {
            String[] fields = line.split(",");
            return new SensorReading(fields[0], new Long(fields[1]), new Double(fields[2]));
        });

        // 定義一個(gè)flatmap操作,檢測(cè)溫度跳變,輸出報(bào)警
        SingleOutputStreamOperator<Tuple3<String, Double, Double>> resultStream = dataStream.keyBy("id")
                .flatMap(new TempChangeWarning(10.0));

        resultStream.print();

        env.execute();
    }

    // 實(shí)現(xiàn)自定義函數(shù)類
    public static class TempChangeWarning extends RichFlatMapFunction<SensorReading, Tuple3<String, Double, Double>>{
        // 私有屬性,溫度跳變閾值
        private Double threshold;

        public TempChangeWarning(Double threshold) {
            this.threshold = threshold;
        }

        // 定義狀態(tài),保存上一次的溫度值
        private ValueState<Double> lastTempState;

        @Override
        public void open(Configuration parameters) throws Exception {
            lastTempState = getRuntimeContext().getState(new ValueStateDescriptor<Double>("last-temp", Double.class));
        }

        @Override
        public void flatMap(SensorReading value, Collector<Tuple3<String, Double, Double>> out) throws Exception {
            // 獲取狀態(tài)
            Double lastTemp = lastTempState.value();

            // 如果狀態(tài)不為null,那么就判斷兩次溫度差值
            if( lastTemp != null ){
                Double diff = Math.abs( value.getTemperature() - lastTemp );
                if( diff >= threshold )
                    out.collect(new Tuple3<>(value.getId(), lastTemp, value.getTemperature()));
            }

            // 更新?tīng)顟B(tài)
            lastTempState.update(value.getTemperature());
        }

        @Override
        public void close() throws Exception {
            lastTempState.clear();
        }
    }
}

輸入:

sensor_1,1547718199,35.8
sensor_1,1547718199,32.4
sensor_1,1547718199,42.4
sensor_10,1547718205,52.6
sensor_10,1547718205,22.5
sensor_7,1547718202,6.7
sensor_7,1547718202,9.9
sensor_1,1547718207,36.3
sensor_7,1547718202,19.9
sensor_7,1547718202,30
image.png

輸出:
中間沒(méi)有輸出(sensor_7,9.9,19.9),應(yīng)該是double浮點(diǎn)數(shù)計(jì)算精度問(wèn)題,不管它

image.png

四. 狀態(tài)后端 State Backends

4.1 概述

每傳入一條數(shù)據(jù),有狀態(tài)的算子任務(wù)都會(huì)讀取和更新?tīng)顟B(tài)。

由于有效的狀態(tài)訪問(wèn)對(duì)于處理數(shù)據(jù)的低延遲至關(guān)重要,因此每個(gè)并行任務(wù)都會(huì)在本地維護(hù)其狀態(tài),以確保快速的狀態(tài)訪問(wèn)。

狀態(tài)的存儲(chǔ)、訪問(wèn)以及維護(hù),由一個(gè)可插入的組件決定,這個(gè)組件就叫做狀態(tài)后端( state backend)

狀態(tài)后端主要負(fù)責(zé)兩件事:本地狀態(tài)管理,以及將檢查點(diǎn)(checkPoint)狀態(tài)寫(xiě)入遠(yuǎn)程存儲(chǔ)

4.2 選擇一個(gè)狀態(tài)后端

  1. MemoryStateBackend
    內(nèi)存級(jí)的狀態(tài)后端,會(huì)將鍵控狀態(tài)作為內(nèi)存中的對(duì)象進(jìn)行管理,將它們存儲(chǔ)在TaskManager的JVM堆上,而將checkpoint存儲(chǔ)在JobManager的內(nèi)存中
    特點(diǎn):快速、低延遲,但不穩(wěn)定

  2. FsStateBackend(默認(rèn))
    將checkpoint存到遠(yuǎn)程的持久化文件系統(tǒng)(FileSystem)上,而對(duì)于本地狀態(tài),跟MemoryStateBackend一樣,也會(huì)存在TaskManager的JVM堆上
    同時(shí)擁有內(nèi)存級(jí)的本地訪問(wèn)速度,和更好的容錯(cuò)保證

  3. RocksDBStateBackend
    將所有狀態(tài)序列化后,存入本地的RocksDB中存儲(chǔ)

4.3 配置文件

flink-conf.yaml

#==============================================================================
# Fault tolerance and checkpointing
#==============================================================================

# The backend that will be used to store operator state checkpoints if
# checkpointing is enabled.
#
# Supported backends are 'jobmanager', 'filesystem', 'rocksdb', or the
# <class-name-of-factory>.
#
# state.backend: filesystem
上面這個(gè)就是默認(rèn)的checkpoint存在filesystem


# Directory for checkpoints filesystem, when using any of the default bundled
# state backends.
#
# state.checkpoints.dir: hdfs://namenode-host:port/flink-checkpoints

# Default target directory for savepoints, optional.
#
# state.savepoints.dir: hdfs://namenode-host:port/flink-savepoints

# Flag to enable/disable incremental checkpoints for backends that
# support incremental checkpoints (like the RocksDB state backend). 
#
# state.backend.incremental: false

# The failover strategy, i.e., how the job computation recovers from task failures.
# Only restart tasks that may have been affected by the task failure, which typically includes
# downstream tasks and potentially upstream tasks if their produced data is no longer available for consumption.

jobmanager.execution.failover-strategy: region

上面這個(gè)region指,多個(gè)并行度的任務(wù)要是有個(gè)掛掉了,只重啟那個(gè)任務(wù)所屬的region(可能含有多個(gè)子任務(wù)),而不需要重啟整個(gè)Flink程序

4.4 樣例代碼

其中使用RocksDBStateBackend需要另外加入pom依賴

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-statebackend-rocksdb_2.11</artifactId>
    <version>1.9.0</version>
</dependency>

代碼:

package org.flink.state;

import org.flink.beans.SensorReading;
import org.apache.flink.api.common.restartstrategy.RestartStrategies;
import org.apache.flink.api.common.time.Time;
import org.apache.flink.contrib.streaming.state.RocksDBStateBackend;
import org.apache.flink.runtime.state.filesystem.FsStateBackend;
import org.apache.flink.runtime.state.memory.MemoryStateBackend;
import org.apache.flink.streaming.api.CheckpointingMode;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

/**
 * @author      只是甲
 * @date        2021-09-17
 * @remark      狀態(tài)后端測(cè)試
 */
public class StateTest4_FaultTolerance {
    public static void main(String[] args) throws Exception{
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        // 1. 狀態(tài)后端配置
        env.setStateBackend( new MemoryStateBackend());
        env.setStateBackend( new FsStateBackend(""));
        env.setStateBackend( new RocksDBStateBackend(""));

        // 2. 檢查點(diǎn)配置
        env.enableCheckpointing(300);

        // 高級(jí)選項(xiàng)
        env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
        env.getCheckpointConfig().setCheckpointTimeout(60000L);
        env.getCheckpointConfig().setMaxConcurrentCheckpoints(2);
        env.getCheckpointConfig().setMinPauseBetweenCheckpoints(100L);
        env.getCheckpointConfig().setPreferCheckpointForRecovery(true);
        env.getCheckpointConfig().setTolerableCheckpointFailureNumber(0);

        // 3. 重啟策略配置
        // 固定延遲重啟
        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(3, 10000L));
        // 失敗率重啟
        env.setRestartStrategy(RestartStrategies.failureRateRestart(3, Time.minutes(10), Time.minutes(1)));

        // socket文本流
        DataStream<String> inputStream = env.socketTextStream("10.31.1.122", 7777);

        // 轉(zhuǎn)換成SensorReading類型
        DataStream<SensorReading> dataStream = inputStream.map(line -> {
            String[] fields = line.split(",");
            return new SensorReading(fields[0], new Long(fields[1]), new Double(fields[2]));
        });

        dataStream.print();
        env.execute();
    }
}

參考:

  1. https://www.bilibili.com/video/BV1qy4y1q728
  2. https://ashiamd.github.io/docsify-notes/#/study/BigData/Flink/%E5%B0%9A%E7%A1%85%E8%B0%B7Flink%E5%85%A5%E9%97%A8%E5%88%B0%E5%AE%9E%E6%88%98-%E5%AD%A6%E4%B9%A0%E7%AC%94%E8%AE%B0?id=_8-flink%e7%8a%b6%e6%80%81%e7%ae%a1%e7%90%86
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容