大數(shù)據(jù)協(xié)作框架

因?yàn)橐獏f(xié)作使用,各組件之間的版本要適應(yīng),Cloudera對(duì)大數(shù)據(jù)組件進(jìn)行了匹配,這里我使用CDH5.3.6套件、JDK使用1.8:


大數(shù)據(jù)組件及JDK

準(zhǔn)備工作

  • 下載壓縮包:http://archive.cloudera.com/cdh5/cdh/5/ 下載CDH5.3.6組件
  • 啟動(dòng)系統(tǒng):虛擬機(jī)下CentOS 6.7
  • 網(wǎng)絡(luò)設(shè)置:
  • 創(chuàng)建工作用戶(hù)和工作目錄:
[root@localhost ~]# useradd hadoop
[root@localhost ~]# passwd 
[root@localhost ~]# su hadoop
[hadoop@localhost root]# cd
[hadoop@localhost ~]# mkdir cdh
  • 上傳壓縮包到cdh目錄
  • 解壓并移動(dòng)壓縮包到Downloads:
[hadoop@localhost ~]# cd cdh
[hadoop@localhost cdh]# ls *.tar.gz | xargs -n1 tar xzvf
[hadoop@localhost cdh]# find ./ -name "*.tar.gz" | xargs -i mv {}  ../Downloads/
  • 環(huán)境變量:
[hadoop@localhost cdh]# vi ~/.bashrc
export JAVA_HOME=/home/hadoop/cdh/jdk1.8.0_111
export PATH=$JAVA_HOME/bin:$PATH

Hadoop

  • 測(cè)試
    Local (Standalone) Mode
[hadoop@localhost cdh]$ cd hadoop-2.5.0-cdh5.3.6/
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ mkdir input
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ touch input/wc.input
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ vi input/wc.input
hello world !
hello hadoop ~
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.6.jar wordcount input/wc.input output
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ cat output/part-r-00000 
!   1
hadoop  1
hello   2
world   1
~   1

Pseudo-Distributed Mode

[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ vi etc/hadoop/core-site.xml 
<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://localhost:9000</value>
        </property> 
        <property> 
                <name>hadoop.tmp.dir</name>
                <value>/home/hadoop/cdh/data/tmp</value>
        </property>
</configuration>
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ vim etc/hadoop/hdfs-site.xml
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ bin/hdfs namenode -format
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ sbin/start-dfs.sh
# 提示輸入密碼,按提示走。如果配置免密登錄就不需要輸入了。
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ bin/hdfs dfs -mkdir /user
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ bin/hdfs dfs -ls /

Fully-Distributed Mode

// 配置主機(jī)名 IP
[hadoop@localhost .ssh]$ su
Password: 
[root@localhost .ssh]# vi /etc/hosts
192.168.2.131 master
192.168.2.132 slave1
192.168.2.133 slave2
// 先配置ssh免密登錄 
# -P表示密碼,-P '' 就表示空密碼,也可以不用-P參數(shù),這樣就要三車(chē)回車(chē),用-P就一次回車(chē)。
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ cd ~/.ssh
[hadoop@localhost .ssh]$ ssh-keygen -t rsa -P '' 
[hadoop@localhost .ssh]$ cat id_rsa.pub >> ./authorized_keys
[hadoop@localhost .ssh]$ chmod 600 authorized_keys
// 修改配置文件
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ vim etc/hadoop/mapred-site.xml
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ vim etc/hadoop/yarn-site.xml 
<configuration>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>127.0.0.1</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

先啟動(dòng)bin/hadoop start namenode

Sqoop

Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.

就是將常用的MapReduce(數(shù)據(jù)的導(dǎo)入導(dǎo)出)進(jìn)行封裝,通過(guò)傳遞參數(shù)的形式,運(yùn)行mapreduce程序。

安裝MySQL
bin/mysqld --initialize --user=hadoop --basedir=/home/hadoop/cdh/mysql --datadir=/home/hadoop/cdh/mysql/data

Flume

# 修改配置文件
$ cd /home/hadoop/cdh/apache-flume-1.5.0-cdh5.3.6-bin
$ cd conf
$ cp flume-conf.properties.template a1.conf
$ vi a1.conf

# define agent
a1.sources = r1
a1.channels = c1
a1.sinks = k1

# define sources 
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444


# define channels
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100


# define sink
a1.sinks.k1.type = logger 
# a1.sinks.k1.maxBytesToLog = 1024

# bind the sources and sinks to the channel 
a1.sources.r1.channels = c1
a1.sinks.k1.channels = c1

先去 https://pkgs.org/ 搜索并下載以下三個(gè)程序,上傳到Downloads目錄下

Telnet
# 使用root用戶(hù)安裝上述三個(gè)程序 然后重啟服務(wù)
$ rpm -ivh ./*.rpm
$ /etc/rc.d/init.d/xinetd restart
$ exit
# 啟動(dòng)服務(wù)
$ bin/flume-ng agent \
> -c conf \
> -n a1 \
> -f conf/a1.conf \
> -Dflume.root.logger=DEBUG,console

Oozie
system requirement
Linux
Java 1.6
hadoop
ExtJs Library

Hue

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • 看到題目有沒(méi)有一種高大上的感覺(jué)?毛線(xiàn),當(dāng)前是個(gè)人、是個(gè)公司都在說(shuō)自己搞大數(shù)據(jù),每天沒(méi)有幾個(gè)PB的數(shù)據(jù)入庫(kù),每天沒(méi)有...
    丁小晶的晶小丁閱讀 4,642評(píng)論 0 50
  • 1 目的將hadoop 2.7.1 安裝到 166、167、168 三臺(tái)機(jī)器上2 提供環(huán)境練習(xí)環(huán)境192.168....
    灼灼2015閱讀 3,623評(píng)論 4 40
  • 配置所需軟件: ①、VirtualBox-5.2.0-118431-Win.exe ②、Ubuntu14.04.5...
    Unique丶Xi閱讀 612評(píng)論 0 2
  • Spring Cloud為開(kāi)發(fā)人員提供了快速構(gòu)建分布式系統(tǒng)中一些常見(jiàn)模式的工具(例如配置管理,服務(wù)發(fā)現(xiàn),斷路器,智...
    卡卡羅2017閱讀 136,506評(píng)論 19 139
  • 環(huán)境 一臺(tái)ubuntu 14.04虛擬機(jī)。 Hadoop版本:2.6.0。 增加用戶(hù) 為了隔離Hadoop和其它軟...
    doc001閱讀 1,941評(píng)論 1 9

友情鏈接更多精彩內(nèi)容