Billions of Messages a Day – Yelp’s Real-time Data Pipeline
by Justin Cunningham, Technical Lead, Software Engineering, Yelp
video, slide
Yelp moved quickly into building out a comprehensive service oriented architecture, and before long had over 100 data-owning production services. Distributing data across an organization creates a number of issues, particularly around the cost of joining disparate data sources, dramatically increasing the complexity of bulk data applications. Straightforward solutions like bulk data APIs and sharing data snapshots have significant drawbacks. Yelp’s Data Pipeline makes it easier for these services to communicate with each other, provides a framework for real-time data processing, and facilitates high-performance bulk data applications – making large SOAs easier to work with. The Data Pipeline provides a series of guarantees that makes it easy to create universal data producers and consumers that can be mashed up into interesting real-time data flows. We’ll show how a few simple services at Yelp lay the foundation that powers everything from search to our experimentation framework.
以下內容來自谷歌翻譯:
Yelp迅速建立了面向全面的面向服務架構,并且長期以來一直擁有超過100個數(shù)據(jù)擁有的生產服務??缃M織分發(fā)數(shù)據(jù)會產生一些問題,特別是在加入不同數(shù)據(jù)源的成本之間,大大增加了批量數(shù)據(jù)應用程序的復雜性。直觀的解決方案,如批量數(shù)據(jù)API和共享數(shù)據(jù)快照具有重大缺陷。 Yelp的數(shù)據(jù)管道使這些服務更容易相互通信,為實時數(shù)據(jù)處理提供框架,并促進高性能批量數(shù)據(jù)應用程序 - 使大型SOA更易于使用。數(shù)據(jù)管道提供了一系列保證,可以輕松創(chuàng)建通用數(shù)據(jù)生產者和消費者,從而將其融入有趣的實時數(shù)據(jù)流中。我們將展示Yelp的幾個簡單服務如何為搜索到實驗框架提供一切依據(jù)。
Body Armor for Distributed System
by Michael Egorov, Co-founder and CTO, NuCypher
video, slide
We show a way to make Kafka end-to-end encrypted. It means that data is ever decrypted only at the side of producers and consumers of the data. The data is never decrypted broker-side. Importantly, all Kafka clients have their own encryption keys. There is no pre-shared encryption key. Our approach can be compared to TLS implemented for more than two parties connected together.
以下內容來自谷歌翻譯:
我們展示了使端到端加密的Kafka的方法。這意味著數(shù)據(jù)只能在數(shù)據(jù)的生產者和消費者的一邊被解密。數(shù)據(jù)從不解密代理方。重要的是,所有Kafka客戶端都有自己的加密密鑰。沒有預共享加密密鑰。我們的方法可以與連接在一起的兩個以上方實施的TLS進行比較。
DNS for Data: The Need for a Stream Registry
by Praveen Hirsave, Director Cloud Engineering, HomeAway
video, slide
As organizations increasingly adopt streaming platforms such as kafka, the need for visibility and discovery has become paramount. Increasingly, with the advent of self-service streaming and analytics, a need to increase on overall speed, not only on time-to-signal, but also on reducing times to production is becoming the difference between winners and losers. Beyond Kafka being at the core of successful streaming platforms, there is a need for a stream registry. Come to this session to find out how HomeAway is solving this with a “just right” approach to governance.
以下內容來自谷歌翻譯:
隨著組織越來越多地采用流媒體平臺,例如kafka,對可見性和發(fā)現(xiàn)的需求變得至關重要。越來越多的隨著自助流媒體和分析技術的出現(xiàn),不僅需要提高總體速度,而且在時間到信號的同時,還要減少生產時間成為贏家和輸家之間的差異。超越Kafka是成功的流媒體平臺的核心,需要一個流注冊表。來參加這個會議,了解HomeAway如何用“正確”的治理方法來解決這個問題。
Efficient Schemas in Motion with Kafka and Schema Registry
by Pat Patterson, Community Champion, StreamSets Inc.
video, slide
Apache Avro allows data to be self-describing, but carries an overhead when used with message queues such as Apache Kafka. Confluent’s open source Schema Registry integrates with Kafka to allow Avro schemas to be passed ‘by reference’, minimizing overhead, and can be used with any application that uses Avro. Learn about Schema Registry, using it with Kafka, and leveraging it in your application.
以下內容來自谷歌翻譯:
Apache Avro允許數(shù)據(jù)進行自我描述,但與消息隊列(如Apache Kafka)一起使用時,會發(fā)生開銷。 Confluent的開源架構注冊表集成了Kafka,以允許Avro模式通過引用傳遞,最大限度地減少開銷,并可與任何使用Avro的應用程序一起使用。了解架構注冊表,使用Kafka,并將其用于您的應用程序。
From Scaling Nightmare to Stream Dream : Real-time Stream Processing at Scale
by Amy Boyle, Software Engineer, New Relic
video, slide
On the events pipeline team at New Relic, Kafka is the thread that stitches our micro-service architecture together. We receive billions of monitoring events an hour, which customers rely on us to alert on in real-time. Facing a ten fold+ growth in the system, learn how we avoided a costly scaling nightmare by switching to a streaming system, based on Kafka. We follow a DevOps philosophy at New Relic. Thus, I have a personal stake in how well our systems perform. If evaluation deadlines are missed, I loose sleep and customers loose trust. Without necessarily setting out to from the start, we’ve gone all in, using Kafka as the backbone of an event-driven pipeline, as a datastore, and for streaming updates to the system. Hear about what worked for us, what challenges we faced, and how we continue to scale our applications.
以下內容來自谷歌翻譯:
在New Relic的事件管道團隊中,Kafka是將我們的微服務體系結合在一起的線程。我們每小時收到數(shù)十億次監(jiān)控事件,客戶依靠我們即時提醒。面對系統(tǒng)的十倍+增長,通過切換到基于Kafka的流式傳輸系統(tǒng),了解我們如何避免昂貴的擴展噩夢。我們按照新遺物的DevOps理念。因此,我對我們的系統(tǒng)執(zhí)行情況有個人利益。如果錯過評估期限,我放松睡眠,客戶信任松散。沒有必要從一開始就開始,我們已經(jīng)全部進入,使用Kafka作為事件驅動的流水線的主干,作為數(shù)據(jù)存儲區(qū),并將流式更新系統(tǒng)。聽取有關我們的工作,我們面臨的挑戰(zhàn)以及我們如何繼續(xù)擴大我們的應用程序。
How Blizzard Used Kafka to Save Our Pipeline (and Azeroth)
by Jeff Field, Systems Engineer, Blizzard
video, slide
When Blizzard started sending gameplay data to Hadoop in 2013, we went through several iterations before settling on Flumes in many data centers around the world reading from RabbitMQ and writing to central flumes in our Los Angeles datacenter. While this worked at first, by 2015 we were hitting problems scaling to the number of events required. This is how we used Kafka to save our pipeline.
以下內容來自谷歌翻譯:
當暴雪在2013年開始向Hadoop發(fā)??送游戲數(shù)據(jù)時,我們經(jīng)歷了幾次迭代,然后在世界各地的許多數(shù)據(jù)中心處理Flumes,從RabbitMQ讀取并寫入我們Los的中央水槽安吉拉數(shù)據(jù)中心。雖然這一工作起初,到2015年,我們正在將問題擴大到所需的事件數(shù)量。這是我們如何使用Kafka來保存我們的管道。
Kafka Connect Best Practices – Advice from the Field
by Randall Hauch, Engineer, Confluent
video, slide
This talk will review the Kafka Connect Framework and discuss building data pipelines using the library of available Connectors. We’ll deploy several data integration pipelines and demonstrate :
best practices for configuring, managing, and tuning the connectors
tools to monitor data flow through the pipeline
using Kafka Streams applications to transform or enhance the data in flight.
以下內容來自谷歌翻譯:
這個討論將回顧Kafka連接框架,并討論使用可用連接器庫構建數(shù)據(jù)管道。我們將部署多個數(shù)據(jù)集成管道并展示:
配置,管理和調整連接器的最佳做法
通過管道監(jiān)視數(shù)據(jù)流的工具
使用Kafka流應用程序來轉換或增強飛行中的數(shù)據(jù)。
One Data Center is Not Enough: Scaling Apache Kafka Across Multiple Data Centers
by Gwen Shapira, Product Manager, Confluent
video, slide
You have made the transition from single machines and one-off solutions to distributed infrastructure in your data center powered by Apache Kafka. But what if one data center is not enough? In this session, we review resilient data pipelines with Apache Kafka that span multiple data centers. We provide an overview of best practices and common patterns including key areas such as architecture and data replication as well as disaster scenarios and failure handling.
以下內容來自谷歌翻譯:
您已經(jīng)通過Apache Kafka,將數(shù)據(jù)中心從單機和一次性解決方案過渡到數(shù)據(jù)中心的分布式基礎設施。但是如果一個數(shù)據(jù)中心還不夠?在本次會議中,我們將審查跨越多個數(shù)據(jù)中心的Apache Kafka的彈性數(shù)據(jù)流水線。我們提供最佳實踐和常見模式的概述,包括架構和數(shù)據(jù)復制以及災難情景和故障處理等關鍵領域。