Kafka Introduction

1. What is event streaming?

Event streaming is the digital equivalent of the human body's central nervous system. It is the technological foundation for the 'always-on' world where businesses are increasingly software-defined and automated, and where the user of software is more software.

事件流是人體中樞神經(jīng)系統(tǒng)的數(shù)字等效物。它是"永遠(yuǎn)在線"世界的技術(shù)基礎(chǔ),在這個世界中,企業(yè)越來越多地由軟件定義和自動化,并且軟件的用戶更多地是軟件。

Technically speaking, event streaming is the practice of capturing data in real-time from event sources like databases, sensors, mobile devices, cloud services, and software applications in the form of streams of events; storing these event streams durably for later retrieval; manipulating, processing, and reacting to the event streams in real-time as well as retrospectively; and routing the event streams to different destination technologies as needed. Event streaming thus ensures a continuous flow and interpretation of data so that the right information is at the right place, at the right time.

從技術(shù)上講,事件流是從事件源(如數(shù)據(jù)庫、傳感器、移動設(shè)備、云服務(wù)和軟件應(yīng)用程序)以事件流的形式實時捕獲數(shù)據(jù)的實踐;持久存儲這些事件流以供以后檢索;實時和回顧性地操作、處理和響應(yīng)事件流;并根據(jù)需要將事件流路由到不同的目標(biāo)技術(shù)。因此,事件流確保了數(shù)據(jù)的連續(xù)流動和解釋,以便正確的信息在正確的時間出現(xiàn)在正確的位置。

2. What can I use event streaming for?

Event streaming is applied to a wide variety of use cases across a plethora of industries and organizations. Its many examples include:

  • To process payments and financial transactions in real-time, such as in stock exchanges, banks, and insurances.
  • To track and monitor cars, trucks, fleets, and shipments in real-time, such as in logistics and the automotive industry.
  • To continuously capture and analyze sensor data from IoT devices or other equipment, such as in factories and wind parks.
  • To collect and immediately react to customer interactions and orders, such as in retail, the hotel and travel industry, and mobile applications.
  • To monitor patients in hospital care and predict changes in condition to ensure timely treatment in emergencies.
  • To connect, store, and make available data produced by different divisions of a company.
  • To serve as the foundation for data platforms, event-driven architectures, and microservices.

事件流應(yīng)用于眾多行業(yè)和組織的各種用例。它的許多例子包括:

  • 實時處理支付和金融交易,例如在證券交易所、銀行和保險中。
  • 實時跟蹤和監(jiān)控汽車、卡車、車隊和貨運,例如物流和汽車行業(yè)。
  • 持續(xù)捕獲和分析來自物聯(lián)網(wǎng)設(shè)備或其他設(shè)備的傳感器數(shù)據(jù),例如工廠和風(fēng)電場。
  • 收集并立即響應(yīng)客戶互動和訂單,例如零售、酒店和旅游行業(yè)以及移動應(yīng)用程序。
  • 監(jiān)測住院病人,預(yù)測病情變化,確保在緊急情況下及時治療。
  • 連接、存儲和提供公司不同部門產(chǎn)生的數(shù)據(jù)。
  • 作為數(shù)據(jù)平臺、事件驅(qū)動架構(gòu)和微服務(wù)的基礎(chǔ)。

3. Apache Kafka? is an event streaming platform. What does that mean?

Kafka combines three key capabilities so you can implement your use cases for event streaming end-to-end with a single battle-tested solution:

  1. To publish (write) and subscribe to (read) streams of events, including continuous import/export of your data from other systems.
  2. To store streams of events durably and reliably for as long as you want.
  3. To process streams of events as they occur or retrospectively.

Kafka 結(jié)合了三個關(guān)鍵功能,因此您可以通過一個經(jīng)過實戰(zhàn)考驗的解決方案實現(xiàn)端到端的事件流用例:

  1. 發(fā)布(寫入)和訂閱(讀?。┦录?,包括從其他系統(tǒng)持續(xù)導(dǎo)入/導(dǎo)出數(shù)據(jù) 。
  2. 根據(jù)需要持久可靠地存儲事件流。
  3. 在事件發(fā)生時或回顧性地處理事件流。

And all this functionality is provided in a distributed, highly scalable, elastic, fault-tolerant, and secure manner. Kafka can be deployed on bare-metal hardware, virtual machines, and containers, and on-premises as well as in the cloud. You can choose between self-managing your Kafka environments and using fully managed services offered by a variety of vendors.

所有這些功能都以分布式、高度可擴展、彈性、容錯和安全的方式提供。Kafka可以部署在裸機硬件、虛擬機和容器上,也可以部署在本地和云端。您可以在自行管理 Kafka 環(huán)境和使用各種供應(yīng)商提供的完全托管服務(wù)之間進(jìn)行選擇。

4. How does Kafka work in a nutshell?

Kafka is a distributed system consisting of servers and clients that communicate via a high-performance TCP network protocol. It can be deployed on bare-metal hardware, virtual machines, and containers in on-premise as well as cloud environments.

Kafka 是一個分布式系統(tǒng),由通過高性能TCP 網(wǎng)絡(luò)協(xié)議進(jìn)行通信的服務(wù)器客戶端組成。它可以部署在本地和云環(huán)境中的裸機硬件、虛擬機和容器上。

Servers: Kafka is run as a cluster of one or more servers that can span multiple datacenters or cloud regions. Some of these servers form the storage layer, called the brokers. Other servers run Kafka Connect to continuously import and export data as event streams to integrate Kafka with your existing systems such as relational databases as well as other Kafka clusters. To let you implement mission-critical use cases, a Kafka cluster is highly scalable and fault-tolerant: if any of its servers fails, the other servers will take over their work to ensure continuous operations without any data loss.

服務(wù)器:Kafka 作為一個或多個服務(wù)器的集群運行,可以跨越多個數(shù)據(jù)中心或云區(qū)域。其中一些服務(wù)器形成存儲層,稱為代理。其他服務(wù)器運行 Kafka Connect以將數(shù)據(jù)作為事件流持續(xù)導(dǎo)入和導(dǎo)出,以將 Kafka 與您現(xiàn)有的系統(tǒng)(如關(guān)系數(shù)據(jù)庫以及其他 Kafka 集群)集成。為了讓您實現(xiàn)關(guān)鍵任務(wù)用例,Kafka 集群具有高度可擴展性和容錯性:如果其中任何一個服務(wù)器出現(xiàn)故障,其他服務(wù)器將接管它們的工作,以確保持續(xù)運行而不會丟失任何數(shù)據(jù)。

Clients: They allow you to write distributed applications and microservices that read, write, and process streams of events in parallel, at scale, and in a fault-tolerant manner even in the case of network problems or machine failures. Kafka ships with some such clients included, which are augmented by dozens of clients provided by the Kafka community: clients are available for Java and Scala including the higher-level Kafka Streams library, for Go, Python, C/C++, and many other programming languages as well as REST APIs.

客戶端:它們允許您編寫分布式應(yīng)用程序和微服務(wù),以并行、大規(guī)模和容錯方式讀取、寫入和處理事件流,即使在網(wǎng)絡(luò)問題或機器故障的情況下也是如此。Kafka 附帶了一些這樣的客戶端,這些客戶端由 Kafka 社區(qū)提供的數(shù)十個客戶端增強:客戶端可用于 Java 和 Scala,包括更高級別的Kafka Streams庫,用于 Go、Python、C/C++ 和許多其他編程語言以及 REST API。

5. Main Concepts and Terminology

An event records the fact that "something happened" in the world or in your business. It is also called record or message in the documentation. When you read or write data to Kafka, you do this in the form of events. Conceptually, an event has a key, value, timestamp, and optional metadata headers. Here's an example event:

  • Event key: "Alice"
  • Event value: "Made a payment of $200 to Bob"
  • Event timestamp: "Jun. 25, 2020 at 2:06 p.m."

事件記錄了世界或您的業(yè)務(wù)中“發(fā)生了某事” 的事實。在文檔中也稱為記錄或消息。當(dāng)您向 Kafka 讀取或?qū)懭霐?shù)據(jù)時,您以事件的形式執(zhí)行此操作。從概念上講,事件具有鍵、值、時間戳和可選的元數(shù)據(jù)標(biāo)頭。這是一個示例事件:

  • 事件鍵:“愛麗絲”
  • 事件價值:“向 Bob 支付了 200 美元”
  • 事件時間戳:“2020 年 6 月 25 日下午 2:06”

Producers are those client applications that publish (write) events to Kafka, and consumers are those that subscribe to (read and process) these events. In Kafka, producers and consumers are fully decoupled and agnostic of each other, which is a key design element to achieve the high scalability that Kafka is known for. For example, producers never need to wait for consumers. Kafka provides various guarantees such as the ability to process events exactly-once.

生產(chǎn)者是那些向 Kafka 發(fā)布(寫入)事件的客戶端應(yīng)用程序,而消費者是訂閱(讀取和處理)這些事件的那些客戶端應(yīng)用程序。在 Kafka 中,生產(chǎn)者和消費者完全解耦并且彼此不可知,這是實現(xiàn) Kafka 眾所周知的高可擴展性的關(guān)鍵設(shè)計元素。例如,生產(chǎn)者永遠(yuǎn)不需要等待消費者。Kafka 提供了各種保證,例如一次性處理事件的能力。

Events are organized and durably stored in topics. Very simplified, a topic is similar to a folder in a filesystem, and the events are the files in that folder. An example topic name could be "payments". Topics in Kafka are always multi-producer and multi-subscriber: a topic can have zero, one, or many producers that write events to it, as well as zero, one, or many consumers that subscribe to these events. Events in a topic can be read as often as needed—unlike traditional messaging systems, events are not deleted after consumption. Instead, you define for how long Kafka should retain your events through a per-topic configuration setting, after which old events will be discarded. Kafka's performance is effectively constant with respect to data size, so storing data for a long time is perfectly fine.

事件被組織并持久地存儲在主題中。非常簡化,主題類似于文件系統(tǒng)中的文件夾,事件是該文件夾中的文件。示例主題名稱可以是“付款”。Kafka 中的主題始終是多生產(chǎn)者和多訂閱者:一個主題可以有零個、一個或多個向其寫入事件的生產(chǎn)者,以及零個、一個或多個訂閱這些事件的消費者。主題中的事件可以根據(jù)需要隨時讀取 — 與傳統(tǒng)的消息傳遞系統(tǒng)不同,事件在消費后不會被刪除。相反,您可以通過每個主題的配置設(shè)置來定義 Kafka 應(yīng)該將您的事件保留多長時間,之后舊事件將被丟棄。Kafka 的性能在數(shù)據(jù)大小方面實際上是恒定的,因此長時間存儲數(shù)據(jù)是非常好的。

Topics are partitioned, meaning a topic is spread over a number of "buckets" located on different Kafka brokers. This distributed placement of your data is very important for scalability because it allows client applications to both read and write the data from/to many brokers at the same time. When a new event is published to a topic, it is actually appended to one of the topic's partitions. Events with the same event key (e.g., a customer or vehicle ID) are written to the same partition, and Kafka guarantees that any consumer of a given topic-partition will always read that partition's events in exactly the same order as they were written.

主題是分區(qū)的,這意味著一個主題分布在位于不同 Kafka 代理上的多個“桶”中。數(shù)據(jù)的這種分布式放置對于可伸縮性非常重要,因為它允許客戶端應(yīng)用程序同時從多個代理讀取和寫入數(shù)據(jù)。當(dāng)一個新事件發(fā)布到一個主題時,它實際上是附加到主題的分區(qū)之一。具有相同事件鍵(例如,客戶或車輛 ID)的事件被寫入同一個分區(qū),并且 Kafka保證給定主題分區(qū)的任何消費者將始終以與寫入事件完全相同的順序讀取該分區(qū)的事件。

Figure: This example topic has four partitions P1–P4. Two different producer clients are publishing, independently from each other, new events to the topic by writing events over the network to the topic's partitions. Events with the same key (denoted by their color in the figure) are written to the same partition. Note that both producers can write to the same partition if appropriate.

圖:此示例主題有四個分區(qū) P1–P4。兩個不同的生產(chǎn)者客戶端通過網(wǎng)絡(luò)將事件寫入主題的分區(qū),彼此獨立地向主題發(fā)布新事件。具有相同鍵的事件(在圖中由它們的顏色表示)被寫入同一個分區(qū)。請注意,如果合適的話,兩個生產(chǎn)者都可以寫入同一個分區(qū)。

To make your data fault-tolerant and highly-available, every topic can be replicated, even across geo-regions or datacenters, so that there are always multiple brokers that have a copy of the data just in case things go wrong, you want to do maintenance on the brokers, and so on. A common production setting is a replication factor of 3, i.e., there will always be three copies of your data. This replication is performed at the level of topic-partitions.

為了使您的數(shù)據(jù)具有容錯性和高可用性,可以復(fù)制每個主題,甚至跨地理區(qū)域或數(shù)據(jù)中心,以便始終有多個代理擁有數(shù)據(jù)副本,以防萬一出現(xiàn)問題,您想要對brokers進(jìn)行維護(hù),等等。一個常見的生產(chǎn)設(shè)置是復(fù)制因子為 3,即始終存在三個數(shù)據(jù)副本。此復(fù)制在主題分區(qū)級別執(zhí)行。

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • PUBLISH & SUBSCRIBE 像一個消息系統(tǒng)一個讀寫流式數(shù)據(jù)Read and write streams...
    CalmHeart閱讀 384評論 0 0
  • Spring Cloud為開發(fā)人員提供了快速構(gòu)建分布式系統(tǒng)中一些常見模式的工具(例如配置管理,服務(wù)發(fā)現(xiàn),斷路器,智...
    卡卡羅2017閱讀 136,553評論 19 139
  • 簡介 old Kafka是一種分布式的,基于發(fā)布/訂閱的消息系統(tǒng)(最初的樣子)。主要設(shè)計目標(biāo)如下: 以時間復(fù)雜度為...
    jiangmo閱讀 403評論 0 0
  • Format Flink 提供了一套與表連接器(table connector)一起使用的表格式(table fo...
    Alex90閱讀 7,921評論 0 1
  • 在非結(jié)構(gòu)化數(shù)據(jù)領(lǐng)域,技術(shù)帶來了前所未有的爆炸性變化。移動設(shè)備、Web站點、社交媒體、科學(xué)儀器、衛(wèi)星、IoT設(shè)備以及...
    達(dá)微閱讀 3,097評論 1 2

友情鏈接更多精彩內(nèi)容