spark術(shù)語(yǔ)

The following table summarizes terms you’ll see used to refer to cluster concepts:

Term Meaning

Application User program built on Spark. Consists of a driver program and executors on the cluster.

Application jar A jar containing the user's Spark application. In some cases users will want to create an "uber jar" containing their application along with its dependencies. The user's jar should never include Hadoop or Spark libraries, however, these will be added at runtime.

Driver program The process running the main() function of the application and creating the SparkContext

Cluster manager An external service for acquiring resources on the cluster (e.g. standalone manager, Mesos, YARN)

Deploy mode Distinguishes where the driver process runs. In "cluster" mode, the framework launches the driver inside of the cluster. In "client" mode, the submitter launches the driver outside of the cluster.

Worker node Any node that can run application code in the cluster

Executor A process launched for an application on a worker node, that runs tasks and keeps data in memory or disk storage across them. Each application has its own executors.

Task A unit of work that will be sent to one executor

Job A parallel computation consisting of multiple tasks that gets spawned in response to a Spark action (e.g. save, collect); you'll see this term used in the driver's logs.

Stage Each job gets divided into smaller sets of tasks called stages that depend on each other (similar to the map and reduce stages in MapReduce); you'll see this term used in the driver's logs.


spark://HOST:PORT Connect to the given Spark standalone cluster master. The port must be whichever one your master is configured to use, which is 7077 by default.

In general, configuration values explicitly set on a SparkConf take the highest precedence, then flags passed to spark-submit, then values in the defaults file.


Note that JARs and files are copied to the working directory for each SparkContext on the executor nodes. This can use up a significant amount of space over time and will need to be cleaned up. With YARN, cleanup is handled automatically, and with Spark standalone, automatic cleanup can be configured with the spark.worker.cleanup.appDataTtl property.

spark.driver.memory 1g Amount of memory to use for the driver process, i.e. where SparkContext is initialized. (e.g. 1g, 2g).

Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-memory command line option or in your default properties file.

spark.driver.extraClassPath spark.driver.extraJavaOptions spark.driver.extraLibraryPath


Container的一些基本概念和工作流程如下:

(1)? Container是YARN中資源的抽象,它封裝了某個(gè)節(jié)點(diǎn)上一定量的資源(CPU和內(nèi)存兩類(lèi)資源)。它跟Linux Container沒(méi)有任何關(guān)系,僅僅是YARN提出的一個(gè)概念(從實(shí)現(xiàn)上看,可看做一個(gè)可序列化/反序列化的Java類(lèi))。

(2)? Container由ApplicationMaster向ResourceManager申請(qǐng)的,由ResouceManager中的資源調(diào)度器異步分配給ApplicationMaster;

(3) Container的運(yùn)行是由ApplicationMaster向資源所在的NodeManager發(fā)起的,Container運(yùn)行時(shí)需提供內(nèi)部執(zhí)行的任務(wù)命令(可以使任何命令,比如java、Python、C++進(jìn)程啟動(dòng)命令均可)以及該命令執(zhí)行所需的環(huán)境變量和外部資源(比如詞典文件、可執(zhí)行文件、jar包等)。

另外,一個(gè)應(yīng)用程序所需的Container分為兩大類(lèi),如下:

(1) 運(yùn)行ApplicationMaster的Container:這是由ResourceManager(向內(nèi)部的資源調(diào)度器)申請(qǐng)和啟動(dòng)的,用戶(hù)提交應(yīng)用程序時(shí),可指定唯一的ApplicationMaster所需的資源;

(2) 運(yùn)行各類(lèi)任務(wù)的Container:這是由ApplicationMaster向ResourceManager申請(qǐng)的,并由ApplicationMaster與NodeManager通信以啟動(dòng)之。

以上兩類(lèi)Container可能在任意節(jié)點(diǎn)上,它們的位置通常而言是隨機(jī)的,即ApplicationMaster可能與它管理的任務(wù)運(yùn)行在一個(gè)節(jié)點(diǎn)上。

Container是YARN中最重要的概念之一,懂得該概念對(duì)于理解YARN的資源模型至關(guān)重要,希望本文對(duì)學(xué)習(xí)Container這一概念有所幫助。


在學(xué)習(xí)Container之前,大家應(yīng)先了解YARN的基本架構(gòu)、工作流程。比如,大家應(yīng)該了解一個(gè)應(yīng)用程序的運(yùn)行過(guò)程如下:

步驟1:用戶(hù)將應(yīng)用程序提交到ResourceManager上;

步驟2:ResourceManager為應(yīng)用程序ApplicationMaster申請(qǐng)資源,并與某個(gè)NodeManager通信,以啟動(dòng)ApplicationMaster;

步驟3:ApplicationMaster與ResourceManager通信,為內(nèi)部要執(zhí)行的任務(wù)申請(qǐng)資源,一旦得到資源后,將于NodeManager通信,以啟動(dòng)對(duì)應(yīng)的任務(wù)。

步驟4:所有任務(wù)運(yùn)行完成后,ApplicationMaster向ResourceManager注銷(xiāo),整個(gè)應(yīng)用程序運(yùn)行結(jié)束。

上述步驟中,步驟2~3涉及到資源申請(qǐng)與使用,而這正是Container出現(xiàn)的地方。

在YARN中,ResourceManager中包含一個(gè)插拔式的組件:資源調(diào)度器,它負(fù)責(zé)資源的管理和調(diào)度,是YARN中最核心的組件之一。

當(dāng)向資源調(diào)度器申請(qǐng)資源,需向它發(fā)送一個(gè)ResourceRequest列表,其中,每個(gè)ResourceRequest描述了一個(gè)資源單元的詳細(xì)需求,而資源調(diào)度器則為之返回分配到的資源描述Container。每個(gè)ResourceRequest可看做一個(gè)可序列化Java對(duì)象,包含的字段信息(直接給出了Protocol Buffers定義)如下:

message ResourceRequestProto {

optional PriorityProto priority = 1; // 資源優(yōu)先級(jí)

optional string resource_name = 2; // 資源名稱(chēng)(期望資源所在的host、rack名稱(chēng)等)

optional ResourceProto capability = 3; // 資源量(僅支持CPU和內(nèi)存兩種資源)

optional int32 num_containers = 4; // 滿(mǎn)足以上條件的資源個(gè)數(shù)

optional bool relax_locality = 5 [default = true];? //是否支持本地性松弛(2.1.0-beta之后的版本新增加的,具體參考我的這篇文章:Hadoop新特性、改進(jìn)、優(yōu)化和Bug分析系列3:YARN-392)

}

從上面定義可以看出,可以為應(yīng)用程序申請(qǐng)任意大小的資源量(CPU和內(nèi)存),且默認(rèn)情況下資源是本地性松弛的,即申請(qǐng)優(yōu)先級(jí)為10,資源名稱(chēng)為“node11”,資源量為<2GB, 1cpu>的5份資源時(shí),如果節(jié)點(diǎn)node11上沒(méi)有滿(mǎn)足要求的資源,則優(yōu)先找node11同一機(jī)架上其他節(jié)點(diǎn)上滿(mǎn)足要求的資源,如果仍找不到,則找其他機(jī)架上的資源。而如果你一定要node11上的節(jié)點(diǎn),則將relax_locality置為false。

發(fā)出資源請(qǐng)求后,資源調(diào)度器并不會(huì)立馬為它返回滿(mǎn)足要求的資源,而需要應(yīng)用程序的ApplicationMaster不斷與ResourceManager通信,探測(cè)分配到的資源,并拉去過(guò)來(lái)使用。一旦分配到資源后,ApplicatioMaster可從資源調(diào)度器那獲取以Container表示的資源,Container可看做一個(gè)可序列化Java對(duì)象,包含的字段信息(直接給出了Protocol Buffers定義)如下:

message ContainerProto {

optional ContainerIdProto id = 1; //container id

optional NodeIdProto nodeId = 2; //container(資源)所在節(jié)點(diǎn)

optional string node_http_address = 3;

optional ResourceProto resource = 4; //container資源量

optional PriorityProto priority = 5; //container優(yōu)先級(jí)

optional hadoop.common.TokenProto container_token = 6; //container token,用于安全認(rèn)證

}

一般而言,每個(gè)Container可用于運(yùn)行一個(gè)任務(wù)。ApplicationMaster收到一個(gè)或多個(gè)Container后,再次將該Container進(jìn)一步分配給內(nèi)部的某個(gè)任務(wù),一旦確定該任務(wù)后,ApplicationMaster需將該任務(wù)運(yùn)行環(huán)境(包含運(yùn)行命令、環(huán)境變量、依賴(lài)的外部文件等)連同Container中的資源信息封裝到ContainerLaunchContext對(duì)象中,進(jìn)而與對(duì)應(yīng)的NodeManager通信,以啟動(dòng)該任務(wù)。ContainerLaunchContext包含的字段信息(直接給出了Protocol Buffers定義)如下:

message ContainerLaunchContextProto {

repeated StringLocalResourceMapProto localResources = 1; //Container啟動(dòng)以來(lái)的外部資源

optional bytes tokens = 2;

repeated StringBytesMapProto service_data = 3;

repeated StringStringMapProto environment = 4; //Container啟動(dòng)所需的環(huán)境變量

repeated string command = 5; //Container內(nèi)部運(yùn)行的任務(wù)啟動(dòng)命令,如果是MapReduce的話,Map/Reduce Task啟動(dòng)命令就在該字段中

repeated ApplicationACLMapProto application_ACLs = 6;

}

每個(gè)ContainerLaunchContext和對(duì)應(yīng)的Container信息(被封裝到了ContainerToken中)將再次被封裝到StartContainerRequest中,也就是說(shuō),ApplicationMaster最終發(fā)送給NodeManager的是StartContainerRequest,每個(gè)StartContainerRequest對(duì)應(yīng)一個(gè)Container和任務(wù)。

Delay scheduling機(jī)制是為了提高數(shù)據(jù)本地性提出的,它的基本思想是,當(dāng)一個(gè)節(jié)點(diǎn)出現(xiàn)空閑資源時(shí),調(diào)度器按照調(diào)度策略應(yīng)將該資源分配給job1,但是job1沒(méi)有滿(mǎn)足locality的任務(wù),考慮到性能問(wèn)題,調(diào)度器暫時(shí)跳過(guò)該作業(yè),而將空閑資源分配給其他有l(wèi)ocality任務(wù)的作業(yè),今后集群出現(xiàn)空閑資源時(shí),job1將一直被跳過(guò),知道它有一個(gè)滿(mǎn)足locality的任務(wù),或者達(dá)到了管理員事先配置的最長(zhǎng)跳過(guò)時(shí)間,這時(shí)候不得不將資源分配給job1(不能讓人家再等了啊,親)。

YARN調(diào)度器是一個(gè)resource-centric調(diào)度器,調(diào)度時(shí)間復(fù)雜度是O(number of nodes),而JobTracker調(diào)度器是一個(gè)task-centric調(diào)度器,調(diào)度時(shí)間復(fù)雜度是O(number of tasks),在設(shè)計(jì)YARN調(diào)度策略時(shí),一定要牢記這一點(diǎn),這是保證YARN高擴(kuò)展性的前提,切莫混淆了這兩種調(diào)度。

對(duì)于FIFO,在hadoop中,就是有節(jié)點(diǎn)匯報(bào)心跳,然后遍歷所有任務(wù)找出優(yōu)先級(jí)最高的滿(mǎn)足本地性的任務(wù),調(diào)度任務(wù)執(zhí)行;在yarn中根據(jù)各個(gè)隊(duì)列資源請(qǐng)求,然后遍歷節(jié)點(diǎn),找到合適資源,將容器列表分派給隊(duì)列。

Yarn,AM從RM申請(qǐng)資源,是pull模式,AM要求NM啟動(dòng)container是push模式, NM向RM匯報(bào)心跳,是Pull模式。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容