Design patterns for container-based distributed systems
基于容器的分布式系統(tǒng)的設(shè)計(jì)模式
Brendan Burns David Oppenheimer
翻譯學(xué)習(xí):zackerycao
1 Introduction
1 介紹
In the late 1980s and early 1990s, object-oriented programming revolutionized software development, popularizing the approach of building of applications as collections of modular components. Today we are seeing a similar revolution in distributed system development, with the increasing popularity of microservice architectures built from containerized software components. Containers [15] [22] [1] [2] are particularly well-suited as the fundamental “object” in distributed systems by virtue of the walls they erect at the container boundary. As this architectural style matures, we are seeing the emergence of design patterns, much as we did for object-oriented programs, and for the same reasonthinking in terms of objects (or containers) abstracts away the lowlevel details of code, eventually revealing higher-level patterns that are common to a variety of applications and algorithms.
在上世紀(jì)80年代末和90年代初,面向?qū)ο缶幊谈镄铝塑浖_發(fā),使得構(gòu)建軟件的方法成了模塊化組建的集合。隨著從容器化組建構(gòu)建的微服務(wù)架構(gòu)越來越受到歡迎,今天我們正看到一場相同的在分布式系統(tǒng)開發(fā)領(lǐng)域的革命。因?yàn)樗谌萜鬟吔绲慕⒌母綦x墻的優(yōu)點(diǎn),容器特別適合作為一個(gè)在分布式系統(tǒng)的基礎(chǔ)的“對(duì)象”。隨著這種架構(gòu)風(fēng)格的成熟,我們正看到設(shè)計(jì)模式的出現(xiàn),就像我們?yōu)槊嫦驅(qū)ο笞龅哪菢?,并且出于同樣的原因——從?duì)象(或者容器)角度思考,從底層的代碼細(xì)節(jié)中抽象出來。最終,揭示各種應(yīng)用和算法所共有的高級(jí)模式。
This paper describes three types of design patterns that we have observed emerging in container-based distributed systems: single-container patterns for container management, single-node patterns of closely cooperating containers, and multi-node patterns for distributed algorithms. Like object-oriented patterns before them, these patterns for distributed computation encode best practices, simplify development, and make the systems where they are used more reliable.
這篇論文描述三種類型我們觀察到出現(xiàn)在容器發(fā)布系統(tǒng)中的設(shè)計(jì)模式:用于容器管理的單容器模式,多容器緊密協(xié)作的單節(jié)點(diǎn)模式,以及用于分布式算法的多節(jié)點(diǎn)模式,就像面向?qū)ο蟪霈F(xiàn)在他們之前,這些用于分布式計(jì)算編碼的最佳實(shí)踐的模式,開發(fā)簡單,并且使得系統(tǒng)被用起來更可靠。
2 Distributed system design patterns
2 分布式系統(tǒng)設(shè)計(jì)模式
After object-oriented programming had been used for some years, design patterns emerged and were documented [3]. These patterns codified and regularized general approaches to solving particular common program-ming problems. This codification further improved the general state of the art in programming because it made it easier for less experienced programmers to produce well-engineered code, and led to the development of reusable.
在面向?qū)ο笫褂枚嗄旰?,設(shè)計(jì)模式出現(xiàn)并且被記錄。這些模式被編撰出來,同時(shí)規(guī)范了解決特別普遍的編程問題的一般方法。這些總結(jié)的特效改善了編程技術(shù)的通用性,因?yàn)樗沟萌狈?jīng)驗(yàn)的程序員去生產(chǎn)設(shè)計(jì)精良的代碼更為輕松,而且引導(dǎo)了程序復(fù)用性開發(fā)。
libraries that made code more reliable and faster to develop.The state-of-the-art in distributed system engineering today looks significantly more like the world of early 1980s programming than it does the world of object-oriented development. Yet it’s clear from the success of the MapReduce pattern [4] in bringing the power of “Big Data” programming to a broad set of fields and de-velopers, that putting in place the right set of patterns can dramatically improve the quality, speed, and accessibility of distributed system programming. But even the success of MapReduce is largely limited to a single programming language, insofar as the Apache Hadoop [5] ecosystem is primarily written in and for Java. Developing a truly comprehensive suite of patterns for distributed system design requires a very generic, language-neutral vehicle to represent the atoms of the system.
庫使得代碼更可靠并且開發(fā)起來更快。先進(jìn)的分布式系統(tǒng)工程對(duì)于今天的意義更像是20世紀(jì)80年代早期的編程世界,而非面向?qū)ο缶幊淌澜?。然而?MapReduce 模式的成功顯然將“大數(shù)據(jù)”編程的能力,賦予了眾多的領(lǐng)域和開發(fā)者,應(yīng)用得當(dāng)?shù)哪J娇梢燥@著地優(yōu)化分布式系統(tǒng)程序的質(zhì)量、速度以及可訪問性。但是,即使 MapReduce 的成功也非常局限在單一程序語言,因?yàn)?Apache Hadoop 生態(tài)是主要是用 Java 寫的。就開發(fā)一個(gè)真正全面的合適分布式系統(tǒng)設(shè)計(jì)的模式而言,則要求非常普遍、語言無關(guān)性的工具,來呈現(xiàn)系統(tǒng)的原子。
Thus it is fortunate that the last two years have seen a dramatic rise in adoption of Linux container technology. The container and the container image are exactly the abstractions needed for the development of distributed systems patterns. To date, containers and container images have achieved the large measure of their popularity simply by being a better, more reliable method for delivering software from development all the way through production. By being hermetically sealed, carrying their dependencies with them, and providing an atomic deployment signal (“succeeded”/“failed”), they dramatically improve on the previous state of the art in deploying software in the datacenter or cloud. But containers have the potential to be much more than just a better deployment vehicle – we believe they are destined to become analogous to objects in object-oriented software systems, and as such will enable the development of distributed system design patterns. In the following sections we explain why we believe this to be the case, and describe some patterns that we see emerging to regularize and guide the engineering of distributed systems over the coming years.
最近兩年 linux 容器技術(shù)被顯著地興起和采用,這是非常幸運(yùn)的,容器和容器鏡像精確地抽象了開發(fā)分布式系統(tǒng)模式中的需求。迄今為止,容器和容器鏡像已經(jīng)被廣泛檢驗(yàn),因?yàn)樗麄儚V受歡迎簡單好用、更可靠的從開發(fā)環(huán)境發(fā)布到生產(chǎn)環(huán)境的方法。依靠緊密的封裝,它能夠攜帶它的依賴,并且提供一個(gè)原子性的發(fā)布信號(hào)(“成功”/“失敗”),他們顯著地改進(jìn)了,從前在云或者數(shù)據(jù)專用線發(fā)布軟件方面的技術(shù)水平。但是容器具有比作為一個(gè)發(fā)布工具更大的潛力——我們相信它們注定會(huì)成為類似對(duì)象在面向?qū)ο筌浖到y(tǒng)中的對(duì)象,并且會(huì)推動(dòng)分布式系統(tǒng)設(shè)計(jì)模式的發(fā)展。在后面小節(jié)中,我們會(huì)揭示為何我們相信這些猜想會(huì)成為現(xiàn)實(shí),并且描述某些我們看到的一些模式,這些模式,會(huì)去規(guī)范化和指引未來幾年里分布式系統(tǒng)工程。
3 Single-container management patterns
3 單容器管理模式
The container provides a natural boundary for defining an interface, much like the object boundary. Containers can expose not only application-specific functionality, but also hooks for management systems, via this interface.
容器提供一道天然的邊界來定義的一個(gè)接口,更像是對(duì)象的邊界。容器可以暴露的不僅僅是應(yīng)用特定的功能,還可以通過這個(gè)接口為管理系統(tǒng)提供的鉤子。
The traditional container management interface is extremely limited. A container effectively exports three verbs: run(), pause(), and stop(). Though this interface is useful, a richer interface can provide even more utility to system developers and operators. And given the ubiquitous support for HTTP web servers in nearly every modern programming language and widespread support for data formats like JSON, it is easy to define an HTTPbased management API that can be “implemented” by having the container host a web server at specific endpoints, in addition to its main functionality.
傳統(tǒng)的容器接口是非常受限的。一個(gè)容器的有效地導(dǎo)出三個(gè)動(dòng)詞:run(),pause(),和 stop()。然而這個(gè)接口是非常管用的,一個(gè)豐富的接口可以提供更多功能,給系統(tǒng)開發(fā)工程師和運(yùn)維工程師。并且在幾乎每種現(xiàn)代編程語言普遍支持 HTTP web 服務(wù),而且廣泛地支持如 JSON 這樣的數(shù)據(jù)格式化,去定義一個(gè)基于 HTTP 的管理 API是很容易,除了容器主要功能之外,可以在特定 endpoints 托管一個(gè) web 服務(wù)來“實(shí)現(xiàn)”。
In the “upward” direction the container can expose a rich set of application information, including application-specific monitoring metrics (QPS, application health, etc.), profiling information of interest to developers (threads, stack, lock contention, network message statistics, etc.), component configuration information, and component logs. As a concrete example of this, Kubernetes [6], Aurora [7], Marathon [8], and other container management systems allow users to define health checks via specified HTTP endpoints (e.g. “/health”). Standardized support for other elements of the “upward” API we have described is more rare.
容器可以向上暴露一個(gè)豐富的應(yīng)用信息集合,包括應(yīng)用特定的健康度量(QPS,應(yīng)用健康,等等),開發(fā)者感興趣的分析信息(線程、棧、鎖競爭,網(wǎng)絡(luò)統(tǒng)計(jì)信息,等等),組件配置信息,和組件日志。作為一個(gè)關(guān)于這點(diǎn)具體的例子,kubernetes , Aurora ,Marathon,和其他的容器管理系統(tǒng)允許用戶通過指定的 HTTP 端點(diǎn)(比如 “健康”)定義健康檢查。標(biāo)準(zhǔn)化對(duì)其他我們描述的屬于向上的 API 元素對(duì)支持更少
In the “downward” direction, the container interface provides a natural place to define a lifecycle that makes it easier to write software components that are controlled by a management system. For example, a cluster management system will typically assign “priorities” to tasks, with high-priority tasks guaranteed to run even when the cluster is oversubscribed. This guarantee is enforced by evicting already-running lower-priority tasks, that will then have to wait until resources become available. Eviction can be implemented by simply killing the lowerpriority task, but this puts an undue burden on the developer to respond to arbitrary death anywhere in their code. If instead, a formal lifecycle is defined between application and management system, then the application components become more manageable, since they conform to a defined contract, and the development of the system becomes easier, since the developer can rely on the contract. For example, Kubernetes uses a “graceful deletion” feature of Docker that warns a container, via the SIGTERM signal, that it is going to be terminated, an application-defined amount of time before it is sent the SIGKILL signal. This allows the application to terminate cleanly by finishing in-flight operations, flushing state to disk, etc. One can imagine extending such a mechanism to provide support for state serialization and recovery that makes state management significantly easier for stateful distributed systems.
在“向下”方向,容器接口提供一個(gè)天然的空間,去定義一個(gè)生命周期,這可以使得去編寫被管理系統(tǒng)控制的軟件組件更為簡單。例如,一個(gè)集群管理系統(tǒng)會(huì)通常會(huì)為任務(wù)分配“優(yōu)先事項(xiàng)”,高優(yōu)先級(jí)任務(wù)也可以被保障運(yùn)行,即使集群是超賣的。這個(gè)保障是強(qiáng)制驅(qū)逐已經(jīng)運(yùn)行低優(yōu)先級(jí)的任務(wù),它會(huì)等待直到資源變得可用為止。驅(qū)逐可以由簡單殺死低優(yōu)先級(jí)的任務(wù)來實(shí)現(xiàn),但是這個(gè)是對(duì)開發(fā)者不當(dāng)?shù)呢?fù)擔(dān),去響應(yīng)在他們代碼里任何時(shí)間任何位置的死亡。如果換一種方式,在應(yīng)用和管理系統(tǒng)之間的定義一個(gè)正式的生命周期,就可以讓應(yīng)用組件更具有可管理性,由于它們根據(jù)一個(gè)協(xié)議定義,同時(shí)系統(tǒng)開發(fā)也變得更容易,因?yàn)殚_發(fā)者可以依賴這個(gè)協(xié)議。比如,Kubernetes 用 Docker “優(yōu)雅刪除”的特性來通過 SIGTERM 信號(hào)來警告容器,該容器將在 SIGKILL 信號(hào)發(fā)送之前,終止應(yīng)用自定義的時(shí)間。這允許應(yīng)用可以干凈地終止,在完成已經(jīng)在運(yùn)行的操作、刷新磁盤狀態(tài)等等之后。我們可以想象的擴(kuò)展這樣一個(gè)支持狀態(tài)序列化和恢復(fù)的機(jī)制,來使得對(duì)有狀態(tài)的分布式系統(tǒng)的狀態(tài)管理更為容易。
As a concrete example of a more complex lifecycle, consider the Android Activity model [9], which features a series of callbacks (e.g. onCreate(), onStart(), onStop(), ...) and a formally defined state machine for how the system triggers these callbacks. Without this formal lifecycle, robust, reliable Android applications would be significantly harder to develop. In the context of container-based systems, this generalizes to application-defined hooks that are invoked when a container is created, when it is started, just before termination, etc. Another example of a “downward” API that a container might support is “replicate yourself” (to scale up the service).
作為一個(gè)更完整的生命周期具體的例子,請(qǐng)考慮 Android Activity 模型,具有一系列回調(diào)(比如 onCreate(),onStart(),onStop(),...)并且有一個(gè)供系統(tǒng)觸發(fā)回調(diào)的正式定義的狀態(tài)機(jī)。若沒有這個(gè)正式的生命周期,想開發(fā)強(qiáng)大、可靠的安卓應(yīng)用會(huì)變得明顯困難。在這個(gè)基于容器的系統(tǒng)里的上下文中,這通常定義為一個(gè)在容器創(chuàng)建、啟動(dòng)、終止等時(shí)候被調(diào)用的鉤子。另一個(gè)關(guān)于“向下” API 的例子中,容器可以支持“自我復(fù)制”(擴(kuò)大服務(wù)規(guī)模)。
4 Single-node, multi-container application patterns
4 單節(jié)點(diǎn),多容器應(yīng)用模式
Beyond the interface of a single container, we also see design patterns emerging that span containers. We have previously identified several such patterns [10]. These single-node patterns consist of symbiotic containers that are co-scheduled onto a single host machine. Container management system support for co-scheduling multiple containers as an atomic unit, an abstraction Kubernetes calls “Pods” and Nomad [11] calls “task groups,” is thus a required feature for enabling the patterns we describe in this section.
除了單個(gè)容器的接口以為,我們也可以看見跨容器設(shè)計(jì)模式的出現(xiàn)。我們有之前認(rèn)定了這一系列的模式。這些單節(jié)點(diǎn)模式由被共同調(diào)度到單個(gè)機(jī)器上的共生容器組成。容器管理系統(tǒng)支持共同調(diào)度多個(gè)容器作為一個(gè)原子化的單元, 這個(gè)抽象概念,Kubernetes 管這叫做 “Pods” ,而 Nomad 叫做 “task groups” ,這是用于我們?cè)诒竟?jié)要描述的設(shè)計(jì)模式的必要特性。
4.1 Sidecar pattern
The first and most common pattern for multi-container deployments is the sidecar pattern. Sidecars extend and enhance the main container. For example, the main container might be a web server, and it might be paired with a “l(fā)ogsaver” sidecar container that collects the web server’s logs from local disk and streams them to a cluster storage system. Figure 1 shows an example of the sidecar pattern. Another common example is a web server that serves from local disk content that is populated by a sidecar container that periodically synchronizes the content from a git repository, content management system, or other data source. Both of these examples are common at Google. Sidecars are possible because containers on the same machine can share a local disk volume.
4.1 sidecar (邊車)模式
sidecar 模式是第一個(gè)并且最普遍的為多容器發(fā)布的模式。sidecars 擴(kuò)展并且提升了主容器的能力。舉個(gè)例子,這個(gè)主容器可能作為一個(gè) web 服務(wù),也許與一個(gè) “l(fā)ogsaver” 的 sidecar 容器配對(duì)——可以從本地磁盤和流收集web 服務(wù)日志到集群存儲(chǔ)系統(tǒng)的。圖 1 展示了一個(gè) Sidecar 模式的例子。另一個(gè)普遍的例子是一個(gè)本地磁盤的內(nèi)容提供服務(wù)的 web 服務(wù),內(nèi)容由一個(gè) sidecar 容器從 git 倉庫、內(nèi)容管理系統(tǒng)、或數(shù)據(jù)源做周期性的同步。這些例子在谷歌都很普遍。sidecars 的可行性,基于容器在一個(gè)相同的機(jī)器上可以共享一個(gè)本地磁盤 volume 。

Figure 1: An example of a sidecar container augmenting an application with log saving.
圖 1 : 一個(gè) sidecar 容器增加一個(gè)應(yīng)用容器的日志存儲(chǔ)能力的例子
While it is always possible to build the functionality of a sidecar container into the main container, there are several benefits to using separate containers. First, the container is the unit of resource accounting and allocation, so for example a web server container’s cgroup[15] can be configured so that it provides consistent lowlatency responses to queries, while the logsaver container is configured to scavenge spare CPU cycles when the web server is not busy. Second, the container is the unit of packaging, so separating serving and log saving into different containers makes it easy to divide responsibility for their development between two separate programming teams, and allows them to be tested independently as well as together. Third, the container is the unit of reuse, so sidecar containers can be paired with numerous different “main” containers (e.g. a log saver container could be used with any component that produces logs). Fourth, the container provides a failure containment boundary, making it possible for the overall system to degrade gracefully (for example, the web server can continue serving even if the log saver has failed). Lastly, the container is the unit of deployment, which allows each piece of functionality to be upgraded and, when necessary, rolled back, independently. (Though it should be noted that this last benefit also comes with a downside – the test matrix for the overall system must consider all of the container version combinations that might be seen in production, which can be large since sets of containers generally can’t be upgraded atomically. Of course while a monolithic application doesn’t have this issue, componentized systems are easier to test in some regards, since they are built from smaller units that can be independently tested.) Note that these five benefits apply to all of the container patterns we describe in the remainder of this paper.
盡管可以將一個(gè) sidecar 容器的功能構(gòu)建在主容器中,但是使用獨(dú)立的容器也有諸多好處。首先容器是一個(gè)資源賬戶和分配的單元,因此,比如說,一個(gè) web 服務(wù)容器的 cgroup 可以被配置,使得他可以提供了一個(gè)持續(xù)的低延時(shí)的查詢響應(yīng),而日志存儲(chǔ)容器是被配置當(dāng) web 服務(wù)不繁忙的時(shí)候去搜尋閑置的 CPU 周期。第二,容器是一個(gè)打包單元,所以獨(dú)立的服務(wù)和日志存儲(chǔ)置于不同的容器,使得分割它們處于不同的獨(dú)立編程團(tuán)隊(duì)中的發(fā)布職責(zé)更為簡單,并且允許他們被獨(dú)立地測試,也可以放在一起測試。第三,容器是一個(gè)重復(fù)使用的單元,所以 sidecar 容器可以與許多的不同的“主”容器配對(duì)(比如,一個(gè)日志存儲(chǔ)容器可以被任意一個(gè)生產(chǎn)日志的組件使用)。第四,容器提供一個(gè)失敗控制邊界,使得整個(gè)系統(tǒng)可以優(yōu)雅地降級(jí)(比如,web 服務(wù)可以持續(xù)服務(wù)盡管日志服務(wù)已經(jīng)failed)。最后,這容器是一個(gè)發(fā)布單元,可以允許每一單項(xiàng)功能被升級(jí),并且當(dāng)必要的時(shí)候、回滾。(盡管它應(yīng)該被注意,最后一點(diǎn)的好處同樣也伴隨一個(gè)負(fù)面--對(duì)整個(gè)系統(tǒng)的測試矩陣必須考慮可能會(huì)在生產(chǎn)中看到所有容器版本的組合,這個(gè)可以變得很大,因?yàn)槿萜骷弦话悴豢梢员蛔詣?dòng)升級(jí)。當(dāng)然盡管一個(gè)單體的應(yīng)用不會(huì)有這個(gè)問題,組件化的系統(tǒng)在某些方面更容易去測試,因?yàn)樗麄兪菑目梢员华?dú)立測試的小的單元構(gòu)建的。)注意上述這五點(diǎn)好處,會(huì)應(yīng)用于所有我們將在本論文的后續(xù)部分中繼續(xù)描述的容器模式。
4.2 Ambassador pattern
The next pattern that we have observed is the ambassador pattern. Ambassador containers proxy communication to and from a main container. For example, a developer might pair an application that is speaking the memcache protocol with a twemproxy ambassador. The application believes that it is simply talking to a single memcache on localhost, but in reality twemproxy is sharding the requests across a distributed installation of multiple memcache nodes elsewhere in the cluster.
4.2 大使模式
大使模式是下一個(gè)我們觀察到的模式。大使容器代理與主容器的交流。舉個(gè)例子,一個(gè)開發(fā)者也許將一個(gè)遵循memcache 協(xié)議的應(yīng)用和一個(gè) twemproxy 配對(duì)。這個(gè)應(yīng)用會(huì)認(rèn)為僅僅是與一個(gè)本機(jī)單節(jié)點(diǎn)的 memcache 程序通信,但是在真實(shí)的 twemproxy 里是分片請(qǐng)求,透傳給分布式安裝在集群其他地方的,多個(gè) memcache 集群節(jié)點(diǎn)。

Figure 2: An example of the ambassador pattern applied to proxying to different memcache shards.
圖 2:一個(gè)大使模式應(yīng)用于代理不同的 memcache 分片的例子
This container pattern simplifies the programmer’s life in three ways: they only have to think and program in terms of their application connecting to a single server on localhost, they can test their application standalone by running a real memcache instance on their local machine instead of the ambassador, and they can reuse the twemproxy ambassador with other applications that might even be coded in different languages. Ambassadors are possible because containers on the same machine share the same localhost network interface. An example of this pattern is shown in Figure 2.
這個(gè)容器模式用三種方法簡化了程序的生命:他們只要考慮和開發(fā)他們應(yīng)用去連接一個(gè)本地的服務(wù),他們可以用一個(gè)一個(gè)真實(shí)運(yùn)行在本機(jī)的 memcache 實(shí)例而非大使,來獨(dú)立地測試他們的應(yīng)用,他們可以在其他也用不同語言編寫的應(yīng)用中重用 twemproxy 大使。大使是可行的,因?yàn)槿萜髟谝粋€(gè)相同的機(jī)器上分享本地網(wǎng)絡(luò)接口。一個(gè)簡單該模式的的例子在圖 2 中展示。
4.3 Adapter pattern
4.3 適配器模式
The final single-node pattern we have observed is the adapter pattern. In contrast to the ambassador pattern, which presents an application with a simplified view of the outside world, adapters present the outside world with a simplified, homogenized view of an applica tion. They do this by standardizing output and interfaces across multiple containers. A concrete example of the adapter pattern is adapters that ensure all containers in a system have the same monitoring interface. Applications today use a wide variety of methods to export their metrics (e.g. JMX, statsd, etc). But it is easier for a single monitoring tool to collect, aggregate, and present metrics from a heterogenous set of applications if all the applications present a consistent monitoring interface. Within Google, we have achieved this via code convention, but this is only possible if you build your software from scratch. The adapter pattern enables the heterogenous world of legacy and open-source applications to present a uniform interface without requiring modification of the original application. The main container can communicate with the adapter through localhost or a shared local volume. This is shown in Figure 3. Note that while some existing monitoring solutions are able to communicate with multiple types of back-ends, they use applicationspecific code in the monitoring system itself, which provides a less clean separation of concerns.
最后一個(gè)我們觀察到的單節(jié)點(diǎn)模式是適配器模式。與大使模式不同,大使模式以一個(gè)簡化的外部世界視圖呈現(xiàn)給應(yīng)用程序,適配器則以一個(gè)簡化的、同質(zhì)化的應(yīng)用程序視圖呈現(xiàn)給外部世界。他們用標(biāo)準(zhǔn)化的輸出和接口跨越多個(gè)容器來實(shí)現(xiàn)這些。一個(gè)關(guān)于適配器模式具體的例子是確保所有在一個(gè)系統(tǒng)中的容器擁有相同的監(jiān)控接口的適配器。如今的應(yīng)用程序使用各種各樣的方法去導(dǎo)出他們的度量(例如,JMX、statsd 等)。但是,如果所有應(yīng)用呈現(xiàn)一個(gè)統(tǒng)一的監(jiān)控接口,這對(duì)于單個(gè)監(jiān)控工具更容易去連接、聚合和呈現(xiàn)從一組異構(gòu)的應(yīng)用程序的度量。在谷歌,我們通過代碼約定來實(shí)現(xiàn)這一點(diǎn),但是這只有從頭構(gòu)建軟件才可能實(shí)現(xiàn)。適配器模式,使得異構(gòu)的世界的資產(chǎn)和開源應(yīng)用程序,無需修改原始的應(yīng)用程序,就可以去呈現(xiàn)一個(gè)統(tǒng)一的接口。主容器可以通過本地或者共享的本地 valume 和適配器通信。這展示在圖 3 中。注意,盡管一些現(xiàn)存的監(jiān)控解決方案,是可以與各種不同類型的后端通信,他們?cè)诒O(jiān)控系統(tǒng)本身中使用特定于應(yīng)用程序的代碼去監(jiān)控它們,如此一來就提供的關(guān)注點(diǎn)分隔的效果就不那么徹底了。

Figure 3: An example of the adapter pattern applied to normalizing the monitoring interface.
圖 3 : 一個(gè)關(guān)于適配器模式應(yīng)用于通用化監(jiān)控接口的例子
5 Multi-node application patterns
5 多節(jié)點(diǎn)應(yīng)用模式
Moving beyond cooperating containers on a single machine, modular containers make it easier to build coordinated multi-node distributed applications. We describe three of these distributed system patterns next. Like the patterns in the previous section, these also require system support for the Pod abstraction.
超越單機(jī)上協(xié)作容器,模塊化的容器是的構(gòu)建協(xié)同多節(jié)點(diǎn)的分布式應(yīng)用更簡單。我們接下來描述三種分布式系統(tǒng)。就如同在前面小節(jié)中提到的模式,這也有一些必要的對(duì) Pod 抽象的系統(tǒng)支持。
5.1 Leader election pattern
5.1 領(lǐng)導(dǎo)選舉模式
One of the most common problems in distributed systems is leader election (e.g. [20]). While replication is commonly used to share load among multiple identical instances of a component, another, more complex use of replication is in applications that need to distinguish one replica from a set as the “l(fā)eader.” The other replicas are available to quickly take the place of the leader if it fails. A system may even run multiple leader elections in parallel, for example to determine the leader of each of multiple shards.There are numerous libraries for performing leader election. They are generally complicated to understand and use correctly, and additionally, they are limited by being implemented in a particular programming language. An alternative to linking a leader election library into the application is to use a leader election container. A set of leader-election containers, each one co-scheduled with an instance of the application that requires leader election, can perform election amongst themselves, and they can present a simplified HTTP API over localhost to each application container that requires leader election (e.g. becomeLeader, renewLeadership, etc.). These leader election containers can be built once, by experts in this complicated area, and then the subsequent simplified interface can be re-used by application developers regardless of their choice of implementation language. This represents the best of abstraction and encapsulation in software engineering.
領(lǐng)導(dǎo)選舉是在分布式系統(tǒng)中一個(gè)最普遍的問題(比如[20])。盡管副本是通常地被用于在多個(gè)相同實(shí)例之間共享負(fù)載,但更復(fù)雜的副本使用另一用法,是在需要從從一組應(yīng)用副本中,區(qū)分一個(gè)作為“l(fā)eader”的副本的分布式應(yīng)用程序。如果領(lǐng)導(dǎo)者 fail 了,另一個(gè)副本可以用于快速地?fù)屨碱I(lǐng)導(dǎo)地位。一個(gè)系統(tǒng)甚至可以運(yùn)行多個(gè)領(lǐng)導(dǎo)選舉,例如去確定多個(gè)分片的領(lǐng)導(dǎo)。這有許多用于做領(lǐng)導(dǎo)選舉的庫。他們通常理解和用起來比較難,與此同時(shí),他們也受限作為在一個(gè)特定的程序語言里的工具。一個(gè)可以替代的連接選舉庫類到應(yīng)用程序到方案,是去使用一個(gè)領(lǐng)導(dǎo)選舉容器。一組領(lǐng)導(dǎo)選舉容器,每一個(gè)都與一個(gè)需要進(jìn)行領(lǐng)導(dǎo)選舉的應(yīng)用程序?qū)嵗?,來共同調(diào)度,就可以在他們之間主持領(lǐng)導(dǎo)選舉,并且他們可以通過本地主機(jī)為每個(gè)需要領(lǐng)導(dǎo)選舉的應(yīng)用程序,提供一個(gè)簡化的 HTTP API (比如 作為領(lǐng)導(dǎo)、更換領(lǐng)導(dǎo)等)。這些領(lǐng)導(dǎo)選舉容器可以一次性構(gòu)建,來作為這個(gè)復(fù)雜領(lǐng)域的專家,而且這個(gè)序列化的簡化接口,不論選擇哪種工具語言的應(yīng)用開發(fā)者都可以重用。這是相當(dāng)于是在軟件工程里,最好的抽象和封裝。
5.2 Work queue pattern
5.2 工作隊(duì)列模式
Although work queues, like leader election, are a wellstudied subject with many frameworks implementing them, they too are an example of a distributed system
盡管工作隊(duì)列,和領(lǐng)導(dǎo)選舉一樣,伴隨許多框架對(duì)他們的實(shí)現(xiàn),都是已經(jīng)被充分研究的主題,他們也是一個(gè)受益于面向容器架構(gòu)的,分布式系統(tǒng)模式的例子。

Figure 4: An illustration of the generic work queue. Reusable framework containers are shown in dark gray, while developer containers are shown in light gray.
圖 4 :一個(gè)常見的工作隊(duì)列的圖表??芍赜玫目蚣苋萜饔冒祷疑硎?,開發(fā)者容器用亮灰色表示。
pattern that can benefit from container-oriented architectures. In previous systems, the framework limited programs to a single language environment (e.g. Celery for Python [13]), or the distribution of work and binary were exercises left to the implementer (e.g. Condor [21]). The availability of containers that implement the run() and mount() interfaces makes it fairly straightforward to implement a generic work queue framework that can take arbitrary processing code packaged as a container, and arbitrary data, and build a complete work queue system. The developer only has to build a container that can take an input data file on the filesystem, and transform it to an output file; this container would become one stage of the work queue. All of the other work involved in developing a complete work queue can be handled by the generic work queue framework that can be reused whenever such a system is needed. The manner in which a user’s code integrates into this shared work queue framework is illustrated in Figure 4.
在以前的系統(tǒng)中,框架受限于單一語言環(huán)境(比如 Celery for Python[13],或者 work 和二進(jìn)制的發(fā)布執(zhí)行被分配給實(shí)現(xiàn)者(比如,Condor [21])。容器實(shí)現(xiàn) run() 和 mount() 接口的能力,使得去實(shí)現(xiàn)一個(gè),可以隨意將代碼包制作成為容器的通用的 work 隊(duì)列框架,變得相當(dāng)簡單。開發(fā)者只要構(gòu)建能夠從文件系統(tǒng)獲取輸入數(shù)據(jù)文件對(duì)容器,并且轉(zhuǎn)換到一個(gè)輸出文件;這個(gè)容器將會(huì)成為一個(gè) work 隊(duì)列對(duì)平臺(tái)。所有其他的,需要開發(fā)一個(gè)完整的 work 隊(duì)列來處理的 work,可以被通用的 work 隊(duì)列框架處理,只要需要這樣的系統(tǒng),這個(gè)框架就可以被重用。用戶代碼集成到共享 work 隊(duì)列框架的方式,被在圖4中說明。
5.3 Scatter/gather pattern
5.3 分散/聚合模式
The last distributed systems pattern we highlight is scatter/gather. In such a system, an external client sends an initial request to a “root” or “parent” node. This root fans the request out to a large number of servers to perform computations in parallel. Each shard returns partial data, and the root gathers this data into a single response to the original request. This pattern is common in search engines. Developing such a distributed system involves a great deal of boilerplate code: fanning out the requests, gathering the responses, interacting with the client, etc. Much of this code is quite generic, and again, as in object-oriented programming, can be refactored in such a way that a single implementation can be provided that can be used with arbitrary containers so long as they client interactions and request fanout to developer-supplied leaf containers and to a developer-supplied container responsible for merging the results (all in light gray).
我們挑選的最后一種分布式系統(tǒng)模式,是分散/聚合模式。在這樣一個(gè)系統(tǒng)里,一個(gè)外部的客戶端發(fā)送一個(gè)初始請(qǐng)求到一個(gè) “root” 或者叫 “parent” 節(jié)點(diǎn)。root 將請(qǐng)求分發(fā)到一個(gè)大量服務(wù)器去執(zhí)行并行運(yùn)算。每個(gè)分片返回一個(gè)部分的數(shù)據(jù),root 聚合這些數(shù)據(jù)到單個(gè)對(duì)原始請(qǐng)求的響應(yīng)中去。這個(gè)模式在搜索引擎中很普遍,聚合響應(yīng),與客戶端交互等等。許多代碼是非常通用的,同樣的,就如同在面向?qū)ο缶幊桃粯?,可以用這樣一種方式重構(gòu), 能提供單個(gè)實(shí)現(xiàn),該實(shí)現(xiàn)可以和任意容器一起使用,只要他們客戶端交互和請(qǐng)求分發(fā)給開發(fā)者提供的葉子容器,并且負(fù)責(zé)為開發(fā)者提供的的容器合并響應(yīng)結(jié)果(全為亮灰色)。

Figure 5: An illustration of the scatter/gather pattern. A reusable root container (dark gray) implements
圖 5 :一個(gè)分散/聚集模式的說明。一個(gè)重用根容器(暗灰色)的實(shí)現(xiàn)
implement a specific interface. In particular, to implement a scatter/gather system, a user is required to supply two containers. First, the container that implements the leaf node computation; this container performs the partial computation and returns the corresponding result. The second container is the merge container; this container takes the aggregated output of all of the leaf containers, and groups them into a single response. It is easy to see how a user can implement a scatter/gather system of arbitrary depth (including parents, in addition to roots, if necessary) simply by providing containers that implement these relatively simple interfaces. Such a system is illustrated in Figure 5.
實(shí)現(xiàn)一個(gè)特定的接口。在特特定情況下,去實(shí)現(xiàn)一個(gè)分散/聚合系統(tǒng),一個(gè)用戶必須提供兩個(gè)容器。第一個(gè)容器,實(shí)現(xiàn)葉子節(jié)點(diǎn)的計(jì)算;這個(gè)容器做部分運(yùn)算并且返回相應(yīng)的結(jié)果。第二個(gè)容器是合并容器;這個(gè)容器聚合所有葉子容器的輸出,并且聚集到單個(gè)響應(yīng)中去。這很容易理解一個(gè)用戶怎么能,依靠提供實(shí)現(xiàn)簡單接口的容器,就可以實(shí)現(xiàn)一個(gè)任意的深度(如果必要的話,除根節(jié)點(diǎn)外,還包括雙親節(jié)點(diǎn)和)的分散/聚合系統(tǒng)。這樣一個(gè)系統(tǒng)如圖 5 中示意。
6 Related work
6 相關(guān)工作
Service-oriented architectures (SOA) [16] pre-date, and share a number of characteristics with, container-based distributed systems. For example, both emphasize reusable components with well-defined interfaces that communicate over a network. On the other hand, components in SOA systems tend to be larger-grain and more loosely-coupled than the multi-container patterns we have described. Additionally, components in SOA often implement business activities, while the components we have focused on here are more akin to generic libraries that make it easier to build distributed systems. The term “microservice” has recently emerged to describe the types of components we have discussed in this paper.
作為基于容器的分布式系統(tǒng)的前身,面向服務(wù)架構(gòu)與之分享了很多特性。比如,兩者都強(qiáng)調(diào)具有通過網(wǎng)絡(luò)通信的,接口良好定義的可重用性組件。另一方面,SOA 系統(tǒng)的組件,與我們描述的多容器模式相比,更趨向于大顆粒,并且更加松耦合。另外,SOA 中的組件經(jīng)常來實(shí)現(xiàn)業(yè)務(wù)活動(dòng),然而這些我們?cè)诖岁P(guān)注的組件更類似于通用庫,使得構(gòu)建分布式應(yīng)用更容易。微服務(wù)一詞最近出現(xiàn),用來描述我們?cè)诒菊撐挠懻摰倪@些類型的組件。
The concept of standardized management interfaces to networked components dates back at least to SNMP [19]. SNMP focuses primarily on managing hardware components, and no standard has yet emerged for managing microservice/container-based systems. This has not prevented the development of numerous container management systems, including Aurora [7], ECS [17], Docker Swarm [18], Kubernetes [6], Marathon [8], and Nomad [11].
網(wǎng)絡(luò)化組件的標(biāo)準(zhǔn)化的管理接口的概念起碼可以追溯到SNMP。SNMP 專注于基礎(chǔ)的的硬件組件管理,同時(shí)尚未有管理微服務(wù)/基于容器的系統(tǒng)的標(biāo)準(zhǔn)出現(xiàn)。這并未阻礙大量容器管理系統(tǒng)的發(fā)展,包括 Aurora [7], ECS [17], Docker Swarm [18], Kubernetes [6], Marathon [8], 和 Nomad [11]。
All of the distributed algorithms we mentioned in Section 5 have a long history. One can find a number of leader election implementations in Github, though they appear to be structured as libraries rather than standalone components. There are a number of popular work queue implementations, including Celery [13] and Amazon SQS [14]. Scatter-gather has been identified as an Enterprise Integration Pattern [12].
所有我們?cè)诘谖骞?jié)提到的分布式算法都擁有漫長的歷史。在 Github 可以找到大量領(lǐng)導(dǎo)人選舉算法的實(shí)現(xiàn),盡管它們是作為結(jié)構(gòu)化的庫出現(xiàn),而非獨(dú)立的組件。有許多留下的工作隊(duì)列的實(shí)現(xiàn),包括 Celery 和 Amazon SQS。分散/聚合模式已經(jīng)被認(rèn)定為一種企業(yè)級(jí)集成模式。
7 Conclusion
7 結(jié)論
Much as object-oriented programming led to the emergence and codification of object-oriented “design patterns,” we see container architectures leading to design patterns for container-based distributed systems. In this paper we identified three types of patterns we have seen emerging: single-container patterns for system management, single-node patterns of closely-cooperating containers, and multi-node patterns for distributed algorithms. In all cases, containers provide many of the same benefits as objects in object-oriented systems, such as making it easy to divide implementation among multiple teams and to reuse components in new contexts. In addition, they provide some benefits unique to distributed systems, such as enabling components to be upgraded independently, to be written in a mixture of languages, and for the system a whole to degrade gracefully. We believe that the set of container patterns will only grow, and that in the coming years they will revolutionize distributed systems programming much as object-oriented programming did in earlier decades, in this case by enabling a standardization and regularization of distributed system development.
許多面向?qū)ο缶幊桃龑?dǎo)來面向?qū)ο蟆霸O(shè)計(jì)模式”的出現(xiàn)和編纂,我們看到容器架構(gòu)正在引領(lǐng)基于容器的分布式系統(tǒng)的設(shè)計(jì)模式。在這篇論文里,我們認(rèn)定了我們觀察到已經(jīng)出現(xiàn)的三種類型的設(shè)計(jì)模式:用于容器管理的單容器模式,多容器緊密協(xié)作的單節(jié)點(diǎn)模式,以及用于分布式算法的多節(jié)點(diǎn)模式。在所有案例里,和作為一個(gè)面向?qū)ο笙到y(tǒng)中對(duì)象一樣,容器具有許多的優(yōu)點(diǎn)。比如,使得在多個(gè)團(tuán)隊(duì)之間分割實(shí)現(xiàn)更簡單,并且可以在新的環(huán)境中重用組件。另外,他們針對(duì)分布式系統(tǒng)也具有一些特有的優(yōu)點(diǎn),比如,使得組件更容易獨(dú)立更新,可以用混合語言編寫,并且整個(gè)系統(tǒng)可以優(yōu)雅降級(jí)。我們相信這些容器模式集合會(huì)繼續(xù)成長,并且在未來幾年里它們會(huì)在分布式系統(tǒng)領(lǐng)域里,和面向?qū)ο缶幊淘谇皫资昴菢?,通過使得分布式系統(tǒng)開發(fā)標(biāo)準(zhǔn)化和規(guī)范化,掀起的革命性的變化。
8 Acknowledgements
8 鳴謝
Ideas don’t simply appear in our heads from a vacuum. The work in this paper has been influenced heavily by conversations with Brian Grant, Tim Hockin, Joe Beda and Craig McLuckie.
這些思路并非憑空簡單出現(xiàn)在我們頭腦里。同Brian Grant,Tim Hockin,Joe Beda 和 Craig McLuckie.的談話,深深地影響在這篇論文中的這些工作。
References
參考文獻(xiàn)
[1] Docker Engine http://www.docker.com
[2] rkt: a security-minded standards-based container engine https://coreos.com/rkt/
[3] Erich Gamma, John Vlissides, Ralph Johnson, Richard Helm, Design Patterns: Elements of
Reusable Object-Oriented Software, AddisonWesley, Massachusetts, 1994.
[4] Jeffrey Dean, Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, Sixth Symposium on Operating System Design and Implementation, San Francisco, CA 2004.
[5] Apache Hadoop, http://hadoop.apache.org
[6] Kubernetes, http://kubernetes.io
[7] Apache Aurora, https://aurora.apache.org.
[8] Marathon:Acluster-wideinitandcontrolsystemfor services, https://mesosphere.github.io/marathon/
[9] Managing the Activity Lifecycle, http://developer.android.com/training/basics/activitylifecycle/index.html
[10] Brendan Burns, The Distributed System ToolKit: Patterns for Composite Containers, http://blog.kubernetes.io/2015/06/the-distributedsystem-toolkit-patterns.html
[11] Nomad by Hashicorp, https://www.nomadproject.io/
[12] Gregor Hohpe, Enterprise Integration Patterns, Addison-Wesley, Massachusetts, 2004.
[13] Celery: Distributed Task Queue, http://www.celeryproject.org/
[14] Amazon Simple Queue Service, https://aws.amazon.com/sqs/
[15] https://www.kernel.org/doc/Documentation/cgroupv1/cgroups.txt
[16] Service Oriented Architecture, https://en.wikipedia.org/wiki/Serviceoriented architecture
[17] Amazon EC2 Container Service,https://aws.amazon.com/ecs/
[18] Docker Swarm https://docker.com/swarm
[19] J. Case, M. Fedor, M. Schoffstall, J. Davin, A Simple Network Management Protocol (SNMP), https://www.ietf.org/rfc/rfc1157.txt, 1990.
[20] R. G. Gallager, P. A. Humblet, P. M. Spira, A distributed algorithm for minimum-weight spanning trees, ACM Transactions on Programming Languages and Systems, January, 1983.
[21] M.J. Litzkow, M. Livny, M. W. Mutka, Condor: a hunter of idle workstations, IEEE Distributed Computing Systems, 1988.