Flink官方翻譯-03Table API & SQL

Apache Flink features two relational APIs - the Table API and SQL - for unified stream and batch processing.

Flink’s SQL support is based on Apache Calcite which implements the SQL standard.

注意事項(xiàng):table api和sql還處于活躍開發(fā)狀態(tài),不是所有功能都已經(jīng)實(shí)現(xiàn)

Please note that the Table API and SQL are not yet feature complete and are being actively developed. Not all operations are supported by every combination of [Table API, SQL] and [stream, batch] input.

Concepts & Common API

Concepts & Common API

Streaming Table API & SQL

關(guān)系查詢在Data Streams

區(qū)別

<colgroup><col style="width: 424px;"><col style="width: 424px;"></colgroup>
|

Relational Algebra / SQL

|

Stream Processing

|
|

Relations (or tables) are bounded (multi-)sets of tuples.

|

A stream is an infinite sequences of tuples.

|
|

A query that is executed on batch data (e.g., a table in a relational database) has access to the complete input data.

|

A streaming query cannot access all data when is started and has to "wait" for data to be streamed in.

|
|

A batch query terminates after it produced a fixed sized result.

|

A streaming query continuously updates its result based on the received records and never completes.

|

聯(lián)系

  • A database table is the result of a stream of INSERT, UPDATE, and DELETE DML statements, often called changelog stream.
  • A materialized view is defined as a SQL query. In order to update the view, the query is continuously processes the changelog streams of the view’s base relations.
  • The materialized view is the result of the streaming SQL query.

Dynamic Tables & Continuous Queries

Dynamic tables are the core concept of Flink’s Table API and SQL support for streaming data. In contrast to the static tables that represent batch data, dynamic table are changing over time.

[圖片上傳失敗...(image-da00e1-1524133656710)]

  1. A stream is converted into a dynamic table.
  2. A continuous query is evaluated on the dynamic table yielding a new dynamic table.
  3. The resulting dynamic table is converted back into a stream.

Defining a Table on a Stream

Essentially, we are building a table from an INSERT-only changelog stream.

The following figure visualizes how the stream of click event (left-hand side) is converted into a table (right-hand side). The resulting table is continuously growing as more records of the click stream are inserted.

[圖片上傳失敗...(image-b30af4-1524133656710)]

注意: A table which is defined on a stream is internally not materialized

Continuous Queries

A continuous query is evaluated on a dynamic table and produces a new dynamic table as result. In contrast to a batch query, a continuous query never terminates and updates its result table according to the updates on its input tables.At any point in time, the result of a continuous query is semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables.本質(zhì)上與在batch table中使用sql查詢,效果是一致的

[圖片上傳失敗...(image-1345f7-1524133656710)]

When the query is started, the clicks table (left-hand side) is empty. The query starts to compute the result table, when the first row is inserted into the clicks table. After the first row [Mary, ./home] was inserted, the result table (right-hand side, top) consists of a single row [Mary, 1]. When the second row [Bob, ./cart] is inserted into the clicks table, the query updates the result table and inserts a new row [Bob, 1]. The third row [Mary, ./prod?id=1] yields an update of an already computed result row such that [Mary, 1] is updated to [Mary, 2]. Finally, the query inserts a third row [Liz, 1] into the result table, when the fourth row is appended to the clicks table.

[圖片上傳失敗...(image-6a8473-1524133656710)]

As before, the input table clicks is shown on the left. The query continuously computes results every hour and updates the result table. The clicks table contains four rows with timestamps (cTime) between 12:00:00 and 12:59:59. The query computes two results rows from this input (one for each user) and appends them to the result table. For the next window between 13:00:00 and 13:59:59, the clicks table contains three rows, which results in another two rows being appended to the result table. The result table is updated, as more rows are appended to clicks over time.

Query Restrictions

State Size: 有些程序會運(yùn)行經(jīng)年累月的,中間的state都會做保存,如果數(shù)據(jù)量大了會造成查詢失敗。

Computing Updates:

Table to Stream Conversion

A dynamic table can be continuously modified by INSERT, UPDATE, and DELETE changes just like a regular database table. It might be a table with a single row, which is constantly updated, an insert-only table without UPDATE and DELETE modifications, or anything in between.

  • Append-only stream: A dynamic table that is only modified by INSERT changes can be converted into a stream by emitting the inserted rows.
  • Retract stream: A retract stream is a stream with two types of messages, add messages and retract messages. A dynamic table is converted into an retract stream by encoding an INSERT change as add message, a DELETE change as retract message, and an UPDATE change as a retract message for the updated (previous) row and an add message for the updating (new) row. The following figure visualizes the conversion of a dynamic table into a retract stream.
  • Upsert stream: An upsert stream is a stream with two types of messages, upsert messages and delete message. A dynamic table that is converted into an upsert stream requires a (possibly composite) unique key. A dynamic table with unique key is converted into a dynamic table by encoding INSERT and UPDATE changes as upsert message and DELETE changes as delete message. The stream consuming operator needs to be aware of the unique key attribute in order to apply messages correctly. The main difference to a retract stream is that UPDATE changes are encoded with a single message and hence more efficient. The following figure visualizes the conversion of a dynamic table into an upsert stream.

Time Attributes

Flink主要提供三種時間粒度:

1.Processing time:使用機(jī)器的的時間,通常被稱為:wall-clock time

2.Event time :事件時間,通常都是基于每行數(shù)據(jù)本身的timestamp

3.Ingestion time:通常是事件進(jìn)入到flink的,通常與事件時間相同。

Processing Time

During DataStream-to-Table Conversion

Using a TableSource

Event time

During DataStream-to-Table Conversion

There are two ways of defining the time attribute when converting a DataStream into a Table:

  • Extending the physical schema by an additional logical field
  • Replacing a physical field by a logical field (e.g. because it is no longer needed after timestamp extraction).

Using a TableSource

The event time attribute is defined by a TableSource that implements the DefinedRowtimeAttribute interface. The logical time attribute is appended to the physical schema defined by the return type of the TableSource.

Timestamps and watermarks must be assigned in the stream that is returned by the getDataStream() method.

Query Configuration

Table API

Update and Append Queries

第一個栗子,是可以更新前面的數(shù)據(jù),changelog stream定義了結(jié)果包含insert 和 update changes。第二個栗子,只是追加結(jié)果到結(jié)果表中。changelog stream 結(jié)果表只是由insert changes組成

SQL

SQL queries are specified with the sql() method of the TableEnvironment. The method returns the result of the SQL query as a Table. A Table can be used in subsequent SQL and Table API queries, be converted into a DataSet or DataStream, or written to a TableSink). SQL and Table API queries can seamlessly mixed and are holistically optimized and translated into a single program.

Table可以從 TableSource, Table, DataStream, 或者 DataSet 轉(zhuǎn)化而來。

或者,用戶可以從 注冊外部的目錄到 TableEnvironment中

為了用sql查詢的方式查詢table,必須注冊到 tableEnvironment中。一個table可以從 TableSource,Table,DataStream , 或者 DataSet中注冊生成?;蛘?,用戶也可以從外部目錄注冊到TableEnvironment中,通過指定一個特定的目錄。

In order to access a table in a SQL query, it must be registered in the TableEnvironment. A table can be registered from a TableSource, Table, DataStream, or DataSet. Alternatively, users can also register external catalogs in a TableEnvironment to specify the location of the data sources.

注意:flink sql的功能支持還沒有健全。有些不支持的sql查詢了之后會報 TableException。所以支持的sql在批量處理中或者流式處理中都列舉在下面

Supported Syntax

Flink parses SQL using Apache Calcite, which supports standard ANSI SQL. DML and DDL statements are not supported by Flink.

Scan, Projection, and Filter

Aggregations

GroupBy Aggregation

GroupBy Window Aggregation

Over Window aggregation

Group Windows

<colgroup><col style="width: 310px;"><col style="width: 310px;"></colgroup>
|

TUMBLE(time_attr, interval)

|

固定窗口

|
|

HOP(time_attr, interval, interval)

|

滑動窗口(跳躍窗口)。有兩個interval參數(shù),第一個主要定義了滑動間隔,第二個主要定義了窗口大小

|
|

SESSION(time_attr, interval)

|

會話窗口。session time window沒有固定的持續(xù)時間,但是它們的邊界是通過時間 interval來交互的。如果在固定的間隙之間沒有新的event進(jìn)入,session window就會關(guān)閉,或者這行數(shù)據(jù)就會添加到已有的window中

|

Time Attributes

  • Processing time。記錄是機(jī)器和系統(tǒng)的時間,當(dāng)在做處理的時候。
  • Event time。消息自帶的時間,可以通過encode等指定。
  • Ingestion time。事件到達(dá)flink的時間。這個與process time的功能類似。

<colgroup><col style="width: 297px;"><col style="width: 293px;"></colgroup>
|

Auxiliary Function

|

Description

|
|

TUMBLE_START(time_attr, interval)

HOP_START(time_attr, interval, interval)

SESSION_START(time_attr, interval)

|

常用于計算窗口的開始時間和結(jié)束時間。Returns the start timestamp of the corresponding tumbling, hopping, and session window.

|
|

TUMBLE_END(time_attr, interval)

HOP_END(time_attr, interval, interval)

SESSION_END(time_attr, interval)

|

Returns the end timestamp of the corresponding tumbling, hopping, and session window.

|

Table Sources & Sinks

User-Defined Functions

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi閱讀 7,854評論 0 10
  • 一、備課反思 高二物理綜合性強(qiáng),我在準(zhǔn)備過程,參照了近幾年全國各地高考題,同時結(jié)合教學(xué)進(jìn)度和學(xué)生情況,進(jìn)行了綜合分...
    酒泉教研室王乾祥閱讀 1,215評論 1 1
  • 想寫點(diǎn)什么,不知道從那里開始,正好想復(fù)習(xí)安卓基礎(chǔ)那就寫寫安卓基礎(chǔ)吧!! 當(dāng)創(chuàng)建第一個項(xiàng)目時,文件新建一直下一步直到...
    icechao閱讀 710評論 0 1
  • 清明雪紛紛,誰料人間白? 若有祭掃人,勿動墳上雪! 一瓶二鍋頭,生前吾所愛。 今日可帶來?幽冥尤寒潮。 讓我痛飲盡...
    樓臺花舍閱讀 215評論 0 11
  • 悼一位親人 我來了你卻走了 一個無語的老人走了 一個無欲的老人走了 一個無痕的老人走了 一個無恨的老人走了 我也會...
    一了0820閱讀 348評論 3 3

友情鏈接更多精彩內(nèi)容