Hadoop的Linux 一般編譯環(huán)境
1. yum 安裝各種準(zhǔn)備命令
yum install svn
# GUN操作系統(tǒng)的編譯和安裝工具
yum install autoconfautomake libtool cmake
yum install -y lzo-devel zlib-devel libtool
yum install ncurses-devel
# 安全通信協(xié)議
yum install openssl-devel
// 安裝重要的壓縮命令
yum install -y snappy snappy-devel
yum install gcc*
2. 下載protocol buffer, 2.5.0以上
Hadoop使用protocol buffer進(jìn)行通信
- 下載地址: https://github.com/protocolbuffers/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
- 解壓后在protobuf-2.5.0/下直接運(yùn)行其
cd protobuf-2.5.0
./configure
make
make check
sudo make install
protoc --version 檢查版本
3. 安裝 findbugs 3.0.1 以上
下載
sudo vim /etc/profile
export FINDBUGS_HOME=/home/app/softpkg/hadoop-compile/findbugs-3.0.1
export PATH=$FINDBUGS_HOME/bin:$PATH
// source 一下后, 測試下findbugs命令:
findbugs -version
4. 安裝cmake命令: 3.6 以上;
Linux yum安裝的cmake 最高只有3.1版本, 沒有3.6的. 需要自己安裝:
- 直接安裝預(yù)編譯包 /配置環(huán)境
- 下載cmake源碼包, 自己編譯
1. 下載預(yù)編譯包; 直接安裝/ 配置環(huán)境變量
* 地址: https://cmake.org/files/v3.13/cmake-3.13.1-Linux-x86_64.tar.gz
tar -xvf
vim /etc/profile
export CMAKE_HOME=
export PATH=$CMAKE_HOME/bin:$PATH
// 測試cmake 命令
cmake -version
2. 下載源碼包, 編譯: 省略
* https://codeload.github.com/Kitware/CMake/zip/v3.13.1
./configure
sudo make && make install
cmake -version
5. 編譯hadoop源碼
cd hadoop-src
mvn clean package -DskipTests -Dtar -Pdist,native
Hadoop-2.6.0-cdh-2.16.2的編譯注意事項(xiàng):
- java版本的問題, 把pom中的 javaVersion屬性改為1.8(默認(rèn)1.7);
plugins.enforcer.RequireJavaVersion failed with message:
Detected JDK Version: 1.8.0-171 is not in the allowed range [1.7.0,1.7.1000}]
<properties>
<javaVersion>1.8</javaVersion>
<targetJavaVersion>1.8</targetJavaVersion>
</properties>
- hadoop-common-project\hadoop-annotations 中報(bào)錯(cuò)
[ERROR] E:\ws\ws-idea\hadoop-source\hadoop-2.6.0-cdh5.16.2\hadoop-common-project\hadoop-annotations\src\main\java\org\apache\hadoop\classification\tools\ExcludePrivateAnnotationsStandardDoclet.java:[34,16] 錯(cuò)誤: 找不到符號
[ERROR] 符號: 類 LanguageVersion
位置: 類 ExcludePrivateAnnotationsStandardDoclet
E:\ws\ws-idea\hadoop-source\hadoop-2.6.0-cdh5.16.2\hadoop-common-project\hadoop-annotations\src\main\java\org\apache\hadoop\classification\tools\ExcludePrivateAnnotationsStandardDoclet.java:[38,30] 錯(cuò)誤: 找不到符號
[ERROR] 符號: 類 RootDoc
位置: 類 ExcludePrivateAnnotationsStandardDoclet
- 編譯Hadoop的源碼
參考 https://blog.csdn.net/tvpbvt/article/details/23843575 解決源碼總相關(guān)BUG
- hadoop-2.2.0-src\hadoop-common-project\hadoop-auth\pom.xml新增 jetty-util
- hadoop-2.2.0-src/hadoop-project/pom.xml文件,新增<additionalparam>-Xdoclint:-html</additionalparam>:
1 . hadoop-auth\pom.xml新增 jetty-util依賴
<dependency>
<groupId>org.mortbay.jetty</groupId>
<artifactId>jetty-util</artifactId>
<scope>test</scope>
</dependency>
2. hadoop-project/pom.xml: maven-javadoc-plugin 中增加 -Xdoclint:-html
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-javadoc-plugin</artifactId>
<executions>
<execution>
...
<configuration>
<destDir>${project.build.directory}</destDir>
<additionalparam>-Xdoclint:-html</additionalparam>
</configuration>
</execution>
</executions>
</plugin>
mvn package -Pdist,native -DskipTests -Dtar
用 hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0/lib/native 替換掉 hadoop-2.2.0/lib/native
替代掉以后, 執(zhí)行檢查
./bin/hdfs getconf -namenodes
ldsver42,這下就對了!
把編譯出來的native包, 替換原來的native包
編譯報(bào)錯(cuò)解決:
[WARNING] [protoc, --version] failed: java.io.IOException: Cannot run program "protoc": error=2, 沒有那個(gè)文件或目錄
原因分析, 是因?yàn)閜rotobuf還沒有按照, 依據(jù)上面步驟按照并make install 即可.
Hadoop-2.7.1下載和編譯
- 從apache下載源碼和預(yù)編譯包
- yum 安裝各種依賴
yum -y install gcc-c++ build-essential autoconf automake libtool cmake zlib1g-dev pkg-config libssl-devua svn openssl-devel ncurses-devel
- 安裝FindBugs組件,
- 下載 findbugs-3.0.1.tar.gz: http://findbugs.sourceforge.net/downloads.html
- 安裝到linxu上并配置環(huán)境變量;
在 /etc/profile 文件末尾添加:
export FINDBUGS_HOME=/opt/findbugs-3.0.1
export PATH=$PATH:$FINDBUGS_HOME/bin
# 驗(yàn)證findbugs 版本
findbugs -version
- 下載安裝snappy
- 下載 snappy-1.1.3.tar.gz(預(yù)編譯包,非源碼包): https://src.fedoraproject.org/repo/pkgs/snappy/
- 編譯和安裝
./configurte
make -j 4 # -j 以4核同時(shí)編譯
make install # 安裝命令
# 查看按照到/usr/local/lib下的snappy包
ls -lh /usr/local/lib |grep snappy
下載安裝ProtocolBuffer
運(yùn)行Mvn命令 編譯hadoop源碼
mvn clean package -Pdist,native -DskipTests -Dtar -Dsnappy.lib=/usr/local/lib -Db
undle.snappy
# 如果中途編譯失敗,并且不要文檔的話,請使用這個(gè)命令
mvn clear package -Pdist,native -DskipTests -Dtar -Dsnappy.lib=/usr/local/lib -Dbundle.snappy -Drequire.openssl
Hadoop 的部署
新環(huán)境Linux基本配置
關(guān)閉selinux
vim /etc/selinux/config
-> SELINUX=disabled
關(guān)閉firewalld防火墻
systemctl stop firewalld
systemctl disable firewalld
設(shè)置主機(jī)網(wǎng)絡(luò)同步, 輔機(jī)與主機(jī)同步
tzselect //選擇時(shí)區(qū)
// 1. Contos7默認(rèn)是以chrony服務(wù)作為時(shí)間同步的, 不建議用ntp;
yum list installed |grep chrony //先檢查是否已安裝chrony包
systemctl list-units |grep chrony //查看是否已安裝chrony服務(wù);
systemctl status chronyd // 查看chronyd服務(wù)狀態(tài);
//若沒有chrony服務(wù), 安裝并開機(jī)啟動
yum -y install chrony
systemctl enable chronyd
systemctl start chronyd
// 編輯同步時(shí)間的服務(wù)器
vim /etc/chrony.conf
# 作為時(shí)間服務(wù)主節(jié)點(diǎn), 從網(wǎng)絡(luò)同步時(shí)間:
server 0.centos.pool.ntp.org iburst
server 1.centos.pool.ntp.org iburst
server time1.aliyun.com iburst
server time2.aliyun.com iburst
server time5.aliyun.com iburst
server time.windows.com iburst
# Slave節(jié)點(diǎn)打開一下配置, 從ldsver55節(jié)點(diǎn)同步時(shí)間;
#server ldsver55 iburst
#server 192.168.51.151 iburst
chronyc sources -v //查看時(shí)間源
chronyc sourcestats -v //查看時(shí)間源狀態(tài)
timedatectl set-ntp yes //啟用NTP時(shí)間同步
chronyc tracking //校驗(yàn)時(shí)間
chronyc activity //顯示有多少NTP源在線/離線
timedatectl //查看系統(tǒng)時(shí)間和時(shí)區(qū);
rpm -q ntp // 檢測ntp服務(wù)是否
// 若需要刪除已安裝的ntp包
rpm -qa | grep -i ntp
yum remove -y ntp-4.2.6p5-29.el7.centos.2.x86_64 // 先查看該包, 再移除該包名
yum remove -y ntp.x86_64
yum remove -y ntpdate
// 2. 安裝ntp服務(wù)作為時(shí)間同步, 若安裝ntpd,則需要先卸載chrony服務(wù);
systemctl list-units --type=service |grep ntp // 查看是否已經(jīng)安裝/禁用ntpd服務(wù);
systemctl status ntpd
systemctl enable ntpd
systemctl start ntpd
//重啟后插卡ntpd服務(wù)是否自動重啟;
集群安裝
配置環(huán)境變量: hadoop-env.sh
主要配置 .bashrc 和 hadoop-env.sh的環(huán)境變量;
總結(jié): 把不在默認(rèn)HADOOP_HOME目錄下的環(huán)境變量,都需要重新設(shè)置下目錄;
vim ~/.bashrc
vim hadoop-env.sh
# 配置基本Home目錄
export JAVA_HOME=/usr/java/jdk-release
export HADOOP_HOME=/home/bigdata/app/hadoop-release
# 配置CONF_DIR目錄
export HADOOP_CONF_DIR=/home/bigdata/data/hadoop/conf
export YARN_CONF_DIR=${HADOOP_CONF_DIR}
export HADOOP_MAPRED_CONF_DIR=${HADOOP_CONF_DIR}
# 配置pid目錄
export HADOOP_PID_DIR=/home/bigdata/data/hadoop/pid/hdfs
export YARN_PID_DIR=/home/bigdata/data/hadoop/pid/yarn
export HADOOP_MAPRED_PID_DIR=/home/bigdata/data/hadoop/pid/mapred
# 配置日志目錄
export HADOOP_LOG_DIR=/home/bigdata/log/hadoop/hdfs
export YARN_LOG_DIR=/home/bigdata/log/hadoop/yarn
export HADOOP_MAPRED_LOG_DIR=/home/bigdata/log/hadoop/madred
啟動mr-jobhistory
開啟hadoop和 yarn的 jobhistory服務(wù)
1. mapred-site.xml 中設(shè)置history的端口;
<property>
<name>mapreduce.jobhistory.address</name>
<value>0.0.0.0:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>0.0.0.0:19888</value>
</property>
// 2. 配置開啟yarn的history日志: vim yarn-site.xml
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 聚合日志,保留2天-->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>172800</value>
</property>
<!--指定文件壓縮類型用于壓縮匯總?cè)罩?->
<property>
<name>yarn.nodemanager.log-aggregation.compression-type</name>
<value>gz</value>
</property>
<!-- nodemanager本地文件存儲目錄, 默認(rèn)就是:${hadoop.tmp.dir}/nm-local-dir -->
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>${hadoop.tmp.dir}/nm-local-dir</value>
</property>
<!-- resourceManager 保存最大的任務(wù)完成個(gè)數(shù) -->
<property>
<name>yarn.resourcemanager.max-completed-applications</name>
<value>100</value>
</property>
// 設(shè)置jobhistory的堆大小,最好不要亂設(shè)置,采用默認(rèn)的;
// export HADOOP_JOB_HISTORYSERVER_HEAPSIZE=2000
// 啟動historyserver
mr-jobhistory-daemon.sh start historyserver
// 停止 history Server
mr-jobhistory-daemon.sh stop historyserver
當(dāng)把HADOOP_CONF_DIR的配置目錄更改后, jobhistory的服務(wù)好像運(yùn)行異常:
查看其腳本:
// 先執(zhí)行那個(gè) mapred-confi.sh 加載mr的環(huán)境變量;
. $HADOOP_LIBEXEC_DIR/mapred-config.sh
// 定義日志文件和pid
log=$HADOOP_MAPRED_LOG_DIR/mapred-$HADOOP_MAPRED_IDENT_STRING-$command-$HOSTNAME.out
pid=$HADOOP_MAPRED_PID_DIR/mapred-$HADOOP_MAPRED_IDENT_STRING-$command.pid
// 最總啟動命令:
nohup nice -n $HADOOP_MAPRED_NICENESS "$HADOOP_MAPRED_HOME"/bin/mapred --config $HADOOP_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null &
nohup nice -n 0 /home/bigdata/app/hadoop-2.6.0-cdh5.16.2/bin/mapred --config /home/bigdata/data/hadoop/conf historyserver > /home/bigdata/app/hadoop-2.6.0-cdh5.16.2/logs/mapred-bigdata-historyserver-ldsver55.out
starting historyserver, logging to /home/bigdata/app/hadoop-2.6.0-cdh5.16.2/logs/mapred-bigdata-historyserver-ldsver55.out
其中l(wèi)og=HADOOP_MAPRED_IDENT_STRING-
HOSTNAME.out
很容易出問題: 就是 HADOOP_MAPRED_LOG_DIR 的環(huán)境變量問題;
重置HADOOP_CONF_DIR在獨(dú)立目錄
要重置hadoop_conf_dir在其他目錄, 需要
設(shè)置環(huán)境 .bashrc中 HADOOP_CONF_DIR的目錄;
// 1. 設(shè)置 shell環(huán)境變量;
vim ~/.bashrc
export HADOOP_CONF_DIR=/home/bigdata/data/hadoop/conf
export YARN_CONF_DIR=/home/bigdata/data/hadoop/conf
// 2. 設(shè)置hadoop-env.sh, yarn-env.sh
export HADOOP_CONF_DIR=/home/bigdata/data/hadoop/conf
export YARN_CONF_DIR=/home/bigdata/data/hadoop/conf
// 3. 設(shè)置 libexec下的hadoop-config.sh
vim libexec/hadoop-config.sh
export HADOOP_CONF_DIR=/home/bigdata/data/hadoop/conf
export YARN_CONF_DIR=/home/bigdata/data/hadoop/conf
給MapReducer添加 spark-shuffle組件
vim yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,spark_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>
加上spark-shuffle以后, 還需要copy SPARK_HOME/yarn下面的shuffle.jar包到 $HADOOP_HOME/share/hadoop/mapreduce2/這個(gè)目錄中來;
采用本地native-lib
Hadoop 各種腳本
Hadoop jar 命令
- bin/hadoop 的Hadoop命令腳本
// 這里導(dǎo)入Classpath作為環(huán)境變量; 應(yīng)該是后面的 RunJar中會自動加載該$CLASSPATH到其環(huán)境變量中;
export CLASSPATH=$CLASSPATH
echo "hadoop shell: $JAVA $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS $@ "
exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
// 對于hadoop jar jarFile className
java -Xmx1000m -Xmx512m \
-agentlib:jdwp=transport=dt_socket,server=n,suspend=n,address=192.168.51.1:45040 \
-Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/bigdata/log/hadoop -Dhadoop.log.file=hadoop.log \
-Dhadoop.home.dir=/home/bigdata/app/hadoop-2.6.0-cdh5.16.2 -Dhadoop.id.str=bigdata -Dhadoop.root.logger=INFO,console \
-Djava.library.path=/home/bigdata/app/hadoop-release/lib/native -Dhadoop.policy.file=hadoop-policy.xml \
-Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,NullAppender \
org.apache.hadoop.util.RunJar /home/bigdata/app/hadoop-release/share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.16.2.jar pi 2 2
RunJar 的參數(shù)
RunJar.main()方法的源碼
RunJar.main(String[] args) {
new RunJar().run(args);{//參數(shù)第一個(gè)是 JarFile ;
String usage = "RunJar jarFile [mainClass] args...";
if (args.length < 1) {
System.exit(-1);
}
int firstArg = 0;
String fileName = args[firstArg++];
File file = new File(fileName);
if (!file.exists() || !file.isFile()) {
System.err.println("Not a valid JAR: " + file.getCanonicalPath());
System.exit(-1);
}
JarFile jarFile =new JarFile(fileName);
// 確定mainClass, 優(yōu)先從Jar中的Mainifest獲取mainClass,其從從第二個(gè)參數(shù)
String mainClassName = null;
Manifest manifest = jarFile.getManifest();
if (manifest != null) {
mainClassName = manifest.getMainAttributes().getValue("Main-Class");
}
jarFile.close();
if (mainClassName == null) { //當(dāng)Jar.Mainifest 沒有時(shí),嘗試從第二個(gè)args參數(shù)解析;
if (args.length < 2) {
System.err.println(usage);
System.exit(-1);
}
mainClassName = args[firstArg++];
}
mainClassName = mainClassName.replaceAll("/", ".");
File workDir = File.createTempFile("hadoop-unjar", "", new File(System.getProperty("java.io.tmpdir"))); //Java應(yīng)用臨時(shí)目錄;
if (!workDir.delete()) {
System.err.println("Delete failed for " + workDir);
System.exit(-1);
}
ensureDirectory(workDir);
ShutdownHookManager.get().addShutdownHook( //鉤子函數(shù),保證wordDir能被完成刪除;
new Runnable() {
@Override
public void run() {FileUtil.fullyDelete(workDir);}
}, SHUTDOWN_HOOK_PRIORITY);
unJar(file, workDir);
ClassLoader loader = createClassLoader(file, workDir);
Thread.currentThread().setContextClassLoader(loader);
// 定義mainClass 和 Dirver的main()方法, args參數(shù);
Class<?> mainClass = Class.forName(mainClassName, true, loader);
Method main = mainClass.getMethod("main", new Class[] {
Array.newInstance(String.class, 0).getClass()
});
String[] newArgs = Arrays.asList(args).subList(firstArg, args.length).toArray(new String[0]);
// 執(zhí)行用戶定義的UserDriver.main()方法
main.invoke(null, new Object[] { newArgs });
}
}
命令的記錄:

關(guān)于RunJar.main() 中classPath的問題

classLoader里 ucp的字段值; 是 jarFile對應(yīng)的值;
其中的partent字段,應(yīng)該就是 bin/hadoop腳本中 ClassPath的