使用docker搭建hadoop偽分布式

這篇文章應(yīng)該算是我大數(shù)據(jù)系列的第一篇文章吧,路漫漫其修遠(yuǎn)兮,我本來不想這么快就往大數(shù)據(jù)方向前進(jìn)的,只是這邊項(xiàng)目需要,不得不學(xué)。本文章對(duì)新手極不友好。前置技能有很多,你總得安裝個(gè)docker吧,還要了解Docker常用命令, 以及Dockerfile,docker-compose的使用。我的不通用的Python程序員練級(jí)攻略里有相應(yīng)的學(xué)習(xí)資料。而該項(xiàng)目的地址是 https://github.com/Niracler/bigdata-exercise

這篇文章主要講的是使用docker搭建hadoop偽分布式的細(xì)節(jié),假如覺得麻煩的話,有更簡便的方法。我這里先說為敬,或者你們可以先試試成功的喜悅。

將這個(gè)項(xiàng)目克隆下來,我的Dockerfile都是基于這個(gè)項(xiàng)目改寫的:

$git clone https://github.com/Niracler/bigdata-exercise.git
$cd bigdata-exercise/docker-hadoop/

啟動(dòng)

$sudo su
$docker-compose -f docker-compose-local.yml up -d

訪問http://localhost:50070/ 效果

Screenshot_20190329_100435.png

進(jìn)入容器

$docker exec -it namenode bash

上傳文件測試

$touch test
$hdfs dfs -put test  /
$hdfs dfs -ls /
Screenshot_20190410_104008.png

然后?就沒有然后了,下面是dockerfile等文件的細(xì)節(jié)

構(gòu)建鏡像

目錄結(jié)構(gòu)

你要先在該目錄結(jié)構(gòu)下

Screenshot_20190329_080604.png

構(gòu)建hadoop-base鏡像

hadoop-base的Dockerfile,之后具體的Dockerfile都是基于該Dockerfile,該Dockerfile的主要工作是下載jdk與hadoop

FROM openjdk:8
MAINTAINER Niracler <niracler@gmail.com>

# 下載并安裝hadoop http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz
ENV HADOOP_VERSION 2.7.7
ENV HADOOP_URL http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz

# 解壓,放到指定位置,并刪除
RUN set -x \
    && curl -fSL "$HADOOP_URL" -o /tmp/hadoop.tar.gz \
    && tar -xvf /tmp/hadoop.tar.gz -C /opt/ \
    && rm /tmp/hadoop.tar.gz*

RUN ln -s /opt/hadoop-$HADOOP_VERSION/etc/hadoop /etc/hadoop
RUN cp /etc/hadoop/mapred-site.xml.template /etc/hadoop/mapred-site.xml
RUN mkdir /opt/hadoop-$HADOOP_VERSION/logs

RUN mkdir /hadoop-data

# 設(shè)置環(huán)境變量
ENV HADOOP_PREFIX=/opt/hadoop-$HADOOP_VERSION
ENV HADOOP_CONF_DIR=/etc/hadoop
ENV MULTIHOMED_NETWORK=1

ENV USER=root
ENV PATH $HADOOP_PREFIX/bin/:$PATH

# 添加啟動(dòng)文件
ADD entrypoint.sh /entrypoint.sh
RUN chmod a+x /entrypoint.sh

ENTRYPOINT ["/entrypoint.sh"]

啟動(dòng)文件entrypoint.sh

#!/bin/bash

# Set some sensible defaults
export CORE_CONF_fs_defaultFS=${CORE_CONF_fs_defaultFS:-hdfs://`hostname -f`:8020}

function addProperty() {
  local path=$1
  local name=$2
  local value=$3

  local entry="<property><name>$name</name><value>${value}</value></property>"
  local escapedEntry=$(echo $entry | sed 's/\//\\\//g')
  sed -i "/<\/configuration>/ s/.*/${escapedEntry}\n&/" $path
}

function configure() {
    local path=$1
    local module=$2
    local envPrefix=$3

    local var
    local value

    echo "Configuring $module"
    for c in `printenv | perl -sne 'print "$1 " if m/^${envPrefix}_(.+?)=.*/' -- -envPrefix=$envPrefix`; do
        name=`echo ${c} | perl -pe 's/___/-/g; s/__/@/g; s/_/./g; s/@/_/g;'`
        var="${envPrefix}_${c}"
        value=${!var}
        echo " - Setting $name=$value"
        addProperty /etc/hadoop/$module-site.xml $name "$value"
    done
}

configure /etc/hadoop/core-site.xml core CORE_CONF
configure /etc/hadoop/hdfs-site.xml hdfs HDFS_CONF
configure /etc/hadoop/yarn-site.xml yarn YARN_CONF
configure /etc/hadoop/httpfs-site.xml httpfs HTTPFS_CONF
configure /etc/hadoop/kms-site.xml kms KMS_CONF

if [ "$MULTIHOMED_NETWORK" = "1" ]; then
    echo "Configuring for multihomed network"

    # HDFS
    addProperty /etc/hadoop/hdfs-site.xml dfs.namenode.rpc-bind-host 0.0.0.0
    addProperty /etc/hadoop/hdfs-site.xml dfs.namenode.servicerpc-bind-host 0.0.0.0
    addProperty /etc/hadoop/hdfs-site.xml dfs.namenode.http-bind-host 0.0.0.0
    addProperty /etc/hadoop/hdfs-site.xml dfs.namenode.https-bind-host 0.0.0.0
    addProperty /etc/hadoop/hdfs-site.xml dfs.client.use.datanode.hostname true
    addProperty /etc/hadoop/hdfs-site.xml dfs.datanode.use.datanode.hostname true

    # YARN
    addProperty /etc/hadoop/yarn-site.xml yarn.resourcemanager.bind-host 0.0.0.0
    addProperty /etc/hadoop/yarn-site.xml yarn.nodemanager.bind-host 0.0.0.0
    addProperty /etc/hadoop/yarn-site.xml yarn.nodemanager.bind-host 0.0.0.0
    addProperty /etc/hadoop/yarn-site.xml yarn.timeline-service.bind-host 0.0.0.0

    # MAPRED
    addProperty /etc/hadoop/mapred-site.xml yarn.nodemanager.bind-host 0.0.0.0
fi

if [ -n "$GANGLIA_HOST" ]; then
    mv /etc/hadoop/hadoop-metrics.properties /etc/hadoop/hadoop-metrics.properties.orig
    mv /etc/hadoop/hadoop-metrics2.properties /etc/hadoop/hadoop-metrics2.properties.orig

    for module in mapred jvm rpc ugi; do
        echo "$module.class=org.apache.hadoop.metrics.ganglia.GangliaContext31"
        echo "$module.period=10"
        echo "$module.servers=$GANGLIA_HOST:8649"
    done > /etc/hadoop/hadoop-metrics.properties

    for module in namenode datanode resourcemanager nodemanager mrappmaster jobhistoryserver; do
        echo "$module.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31"
        echo "$module.sink.ganglia.period=10"
        echo "$module.sink.ganglia.supportsparse=true"
        echo "$module.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both"
        echo "$module.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40"
        echo "$module.sink.ganglia.servers=$GANGLIA_HOST:8649"
    done > /etc/hadoop/hadoop-metrics2.properties
fi

exec $@

構(gòu)建hadoop-base鏡像

$docker build -t="hadoop-base" ./base

構(gòu)建hadoop-namenode

hadoop-namenode的Dockerfile

FROM hadoop-base
MAINTAINER niracler <niracler@gmail.com>

HEALTHCHECK CMD curl -f http://localhost:50070/ || exit 1

ENV HDFS_CONF_dfs_namenode_name_dir=file:///hadoop/dfs/name
RUN mkdir -p /hadoop/dfs/name
VOLUME /hadoop/dfs/name

ADD run.sh /run.sh
RUN chmod a+x /run.sh

EXPOSE 50070

CMD ["/run.sh"]

啟動(dòng)文件run.sh

#!/bin/bash

namedir=`echo $HDFS_CONF_dfs_namenode_name_dir | perl -pe 's#file://##'`
if [ ! -d $namedir ]; then
  echo "Namenode name directory not found: $namedir"
  exit 2
fi

if [ -z "$CLUSTER_NAME" ]; then
  echo "Cluster name not specified"
  exit 2
fi

if [ "`ls -A $namedir`" == "" ]; then
  echo "Formatting namenode name directory: $namedir"
  $HADOOP_PREFIX/bin/hdfs --config $HADOOP_CONF_DIR namenode -format $CLUSTER_NAME
fi

$HADOOP_PREFIX/bin/hdfs --config $HADOOP_CONF_DIR namenode

構(gòu)建hadoop-namenode鏡像

$docker build -t="hadoop-namenode" ./namenode

構(gòu)建hadoop-datanode

hadoop-datanode的Dockerfile

FROM hadoop-base
MAINTAINER niracler <niracler@gmail.com>

HEALTHCHECK CMD curl -f http://localhost:50075/ || exit 1

ENV HDFS_CONF_dfs_datanode_data_dir=file:///hadoop/dfs/data
RUN mkdir -p /hadoop/dfs/data
VOLUME /hadoop/dfs/data

ADD run.sh /run.sh
RUN chmod a+x /run.sh

EXPOSE 50075

CMD ["/run.sh"]

啟動(dòng)文件run.sh

#!/bin/bash

datadir=`echo $HDFS_CONF_dfs_datanode_data_dir | perl -pe 's#file://##'`
if [ ! -d $datadir ]; then
  echo "Datanode data directory not found: $datadir"
  exit 2
fi

$HADOOP_PREFIX/bin/hdfs --config $HADOOP_CONF_DIR datanode

構(gòu)建hadoop-datanode鏡像

$docker build -t="hadoop-datanode" ./datanode

構(gòu)建hadoop-nodemanager

hadoop-nodemanager的Dockerfile

FROM hadoop-base
MAINTAINER Niracler <niracler@gmail.com>

HEALTHCHECK CMD curl -f http://localhost:8042/ || exit 1

ADD run.sh /run.sh
RUN chmod a+x /run.sh

EXPOSE 8042

CMD ["/run.sh"]

啟動(dòng)文件run.sh

#!/bin/bash

$HADOOP_PREFIX/bin/yarn --config $HADOOP_CONF_DIR nodemanager

構(gòu)建hadoop-nodemanager

$docker build -t="hadoop-nodemanager" ./nodemanager

構(gòu)建hadoop-resourcemanager

hadoop-resourcemanager的Dockerfile

FROM hadoop-base
MAINTAINER niracler <niracler@gmail.com>

HEALTHCHECK CMD curl -f http://localhost:8088/ || exit 1

ADD run.sh /run.sh
RUN chmod a+x /run.sh

EXPOSE 8088

CMD ["/run.sh"]

啟動(dòng)文件run.sh

#!/bin/bash

$HADOOP_PREFIX/bin/yarn --config $HADOOP_CONF_DIR resourcemanager
$docker build -t="hadoop-resourcemanager" ./resourcemanager

構(gòu)建hadoop-historyserver

hadoop-historyserver的Dockerfile

FROM hadoop-base
MAINTAINER niracler <niracler@gmail.com>

HEALTHCHECK CMD curl -f http://localhost:8188/ || exit 1

ENV YARN_CONF_yarn_timeline___service_leveldb___timeline___store_path=/hadoop/yarn/timeline
RUN mkdir -p /hadoop/yarn/timeline
VOLUME /hadoop/yarn/timeline

ADD run.sh /run.sh
RUN chmod a+x /run.sh

EXPOSE 8188

CMD ["/run.sh"]

啟動(dòng)文件run.sh

#!/bin/bash

$HADOOP_PREFIX/bin/yarn --config $HADOOP_CONF_DIR historyserver
$docker build -t="hadoop-historyserver" ./historyserver

配置文件

CORE_CONF_fs_defaultFS=hdfs://namenode:8020
CORE_CONF_hadoop_http_staticuser_user=root
CORE_CONF_hadoop_proxyuser_hue_hosts=*
CORE_CONF_hadoop_proxyuser_hue_groups=*

HDFS_CONF_dfs_webhdfs_enabled=true
HDFS_CONF_dfs_permissions_enabled=false

YARN_CONF_yarn_log___aggregation___enable=true
YARN_CONF_yarn_resourcemanager_recovery_enabled=true
YARN_CONF_yarn_resourcemanager_store_class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
YARN_CONF_yarn_resourcemanager_fs_state___store_uri=/rmstate
YARN_CONF_yarn_nodemanager_remote___app___log___dir=/app-logs
YARN_CONF_yarn_log_server_url=http://historyserver:8188/applicationhistory/logs/
YARN_CONF_yarn_timeline___service_enabled=true
YARN_CONF_yarn_timeline___service_generic___application___history_enabled=true
YARN_CONF_yarn_resourcemanager_system___metrics___publisher_enabled=true
YARN_CONF_yarn_resourcemanager_hostname=resourcemanager
YARN_CONF_yarn_timeline___service_hostname=historyserver
YARN_CONF_yarn_resourcemanager_address=resourcemanager:8032
YARN_CONF_yarn_resourcemanager_scheduler_address=resourcemanager:8030
YARN_CONF_yarn_resourcemanager_resource___tracker_address=resourcemanager:8031

啟動(dòng)

這里使用docker-compose,docker-compose-local.yml內(nèi)容如下:

version: "2"

services:
  namenode:
    image: hadoop-namenode
    hostname: namenode
    container_name: namenode
    volumes:
      - ./data/namenode:/hadoop/dfs/name
    environment:
      - CLUSTER_NAME=test
    env_file:
      - ./hadoop.env
    ports:
      - "50070:50070"

  resourcemanager:
    image: hadoop-resourcemanager
    hostname: resourcemanager
    container_name: resourcemanager
    depends_on:
      - "namenode"
    links:
      - "namenode"
    env_file:
      - ./hadoop.env

  historyserver:
    image: hadoop-historyserver
    hostname: historyserver
    container_name: historyserver
    volumes:
      - ./data/historyserver:/hadoop/yarn/timeline
    depends_on:
      - "namenode"
    links:
      - "namenode"
    env_file:
      - ./hadoop.env

  nodemanager1:
    image: hadoop-nodemanager
    hostname: nodemanager1
    container_name: nodemanager1
    depends_on:
      - "namenode"
      - "resourcemanager"
    links:
      - "namenode"
      - "resourcemanager"
    env_file:
      - ./hadoop.env

  datanode1:
    image: hadoop-datanode
    hostname: datanode1
    container_name: datanode1
    depends_on:
      - "namenode"
    links:
      - "namenode"
    volumes:
      - ./data/datanode1:/hadoop/dfs/data
    env_file:
      - ./hadoop.env

  datanode2:
    image: hadoop-datanode
    hostname: datanode2
    container_name: datanode2
    depends_on:
      - "namenode"
    links:
      - "namenode"
    volumes:
      - ./data/datanode2:/hadoop/dfs/data
    env_file:
      - ./hadoop.env

  datanode3:
    image: hadoop-datanode
    hostname: datanode3
    container_name: datanode3
    depends_on:
      - "namenode"
    links:
      - "namenode"
    volumes:
      - ./data/datanode3:/hadoop/dfs/data
    env_file:
      - ./hadoop.env

啟動(dòng)

$docker-compose -f docker-compose-local.yml up -d

訪問 http://localhost:50070/ 效果

Screenshot_20190329_100435.png

參考文章

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • 最近在學(xué)習(xí)大數(shù)據(jù)技術(shù),朋友叫我直接學(xué)習(xí)Spark,英雄不問出處,菜鳥不問對(duì)錯(cuò),于是我就開始了Spark學(xué)習(xí)。 為什...
    Plokmijn閱讀 26,837評(píng)論 6 26
  • 一、系統(tǒng)參數(shù)配置優(yōu)化 1、系統(tǒng)內(nèi)核參數(shù)優(yōu)化配置 修改文件/etc/sysctl.conf,添加如下配置,然后執(zhí)行s...
    張偉科閱讀 3,921評(píng)論 0 14
  • 秋風(fēng)勁,菊花黃。又到大閘蟹上市時(shí)節(jié)了。 現(xiàn)在的大閘蟹多半是魚塘養(yǎng)殖的,就是水面較大的湖水養(yǎng)殖出來的蟹子...
    河沿邊閱讀 1,119評(píng)論 5 9
  • 今天回看了得到的直播,劉瀾老師講到學(xué)校教育: 第一,鼓勵(lì)個(gè)人成績,而非團(tuán)隊(duì)。 第二,培養(yǎng)了給答案的習(xí)慣,而不是提問...
    快樂大盜閱讀 306評(píng)論 0 0
  • 穿越時(shí)空的思念 在 風(fēng)居住的街道 刮起了一陣風(fēng)之詩 當(dāng)然 還有一本情書~
    有時(shí)后閱讀 84評(píng)論 0 3

友情鏈接更多精彩內(nèi)容