一、1.11.0版本及以前
以前的方式是先編譯flink-shaded-hadoop這個(gè)包,將hadoop和hive指定你對(duì)應(yīng)生產(chǎn)的版本編譯出flink-shaded-hadoop-2-uber_xxx包,然后將這個(gè)包放在lib的目錄下,flink啟動(dòng)任務(wù)的時(shí)候去lib加載。
想用這種方式可以參考兩個(gè)鏈接:
https://blog.csdn.net/weixin_44628586/article/details/107106547
https://blog.csdn.net/guiyifei/article/details/109325980#comments_14400773
flink-shade官網(wǎng)源碼地址:https://github.com/apache/flink-shaded

二、1.11.0版本以后
Flink官方為了讓Flink變得Hadoop Free,現(xiàn)在能支持hadoop2和hadoop3,同時(shí)可以指定不同的Hadoop環(huán)境。
為了達(dá)到這一目標(biāo),通過(guò)設(shè)置export HADOOP_CLASSPATH=hadoop classpath即可,不用編譯flink-shaded包。
重點(diǎn)編譯好的Flink的jar里面是沒(méi)有包含Hadoop和Hive的代碼。當(dāng)Flink任務(wù)啟動(dòng)的時(shí)候,JM和TM都是通過(guò)HADOOP_CLASSPATH環(huán)境變量獲取Hadoop的相關(guān)變量。
剛開(kāi)始小菜雞以為是
hadoop classpath只是某個(gè)隨便寫(xiě)的某個(gè)路徑,后面多虧了渣渣瑞普及小白知識(shí),``里面是命令,之前記得后面給忘了,所以hadoop classpath是個(gè)命令,執(zhí)行完之后會(huì)看到hadoop所依賴(lài)的環(huán)境變量:
[yujianbo@qzcs86 ~]$ hadoop classpath
/etc/hadoop/conf:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/libexec/../../hadoop/lib/:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/libexec/../../hadoop/.//:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/libexec/../../hadoop-hdfs/./:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/libexec/../../hadoop-hdfs/lib/:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/libexec/../../hadoop-hdfs/.//:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/.//:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/libexec/../../hadoop-yarn/lib/:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/libexec/../../hadoop-yarn/.//*
具體來(lái)處可以參考:
社區(qū)的信箱:http://apache-flink.147419.n8.nabble.com/flink-shaded-hadoop-2-uber-td9345.html
1.12官網(wǎng)的依賴(lài)Yarn的準(zhǔn)備以及Hadoop的版本支持都有提及:
https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/resource-providers/yarn.html#preparation
https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/resource-providers/yarn.html#supported-hadoop-versions1.11版本官網(wǎng)Hadoop集成:
https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/hadoop.html#providing-hadoop-classes
這個(gè)鏈接里面有這么一段話就可以說(shuō)明
Flink will use the environment variable HADOOP_CLASSPATH to augment the classpath that is used when starting Flink components such as the Client, JobManager, or TaskManager. Most Hadoop distributions and cloud environments will not set this variable by default so if the Hadoop classpath should be picked up by Flink the environment variable must be exported on all machines that are running Flink components.
Flink將使用環(huán)境變量HADOOP CLASSPATH來(lái)擴(kuò)展啟動(dòng)Flink組件(如客戶(hù)機(jī)、JobManager或TaskManager)時(shí)使用的類(lèi)路徑。大多數(shù)Hadoop發(fā)行版和云環(huán)境在默認(rèn)情況下不會(huì)設(shè)置這個(gè)變量,因此,如果應(yīng)該由Flink獲取Hadoop類(lèi)路徑,則必須在運(yùn)行Flink組件的所有機(jī)器上導(dǎo)出環(huán)境變量。