大數(shù)據(jù)日志分析Hadoop項(xiàng)目實(shí)戰(zhàn)

0x00 教程內(nèi)容

  1. 大數(shù)據(jù)日志分析系統(tǒng)簡(jiǎn)介
  2. UserAgentParser的使用
  3. 實(shí)戰(zhàn)準(zhǔn)備
  4. 項(xiàng)目實(shí)戰(zhàn)
  5. 結(jié)果展示

0x01 大數(shù)據(jù)日志分析系統(tǒng)簡(jiǎn)介

1. 需求

a. 簡(jiǎn)單統(tǒng)計(jì)網(wǎng)站的訪問(wèn)日志中每個(gè)瀏覽器的訪問(wèn)次數(shù)

2. 背景及架構(gòu)

a. 請(qǐng)參考文章:大數(shù)據(jù)日志分析系統(tǒng)背景及架構(gòu)

0x02 UserAgentParser

1. UserAgentParser的介紹

a. 可以用來(lái)解析http user-agent信息的小工具(別人寫好的小項(xiàng)目)

2. user-agent信息

a. 信息樣式
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36
b. 查看user-agent(進(jìn)入網(wǎng)站按<kbd>F12</kbd>進(jìn)入檢查界面,刷新一下)

在這里插入圖片描述

0x03 實(shí)戰(zhàn)準(zhǔn)備

1. 下載UserAgentParser小工具

a. 下載地址(可以用git或者直接下載壓縮包然后解壓):
https://github.com/LeeKemp/UserAgentParser

2. 安裝對(duì)應(yīng)的jar包到本地Maven倉(cāng)庫(kù)

a. 用Maven打包小工具成jar包(進(jìn)入主目錄,如:E:\workspace\UserAgentParser-master)
mvn clean package -DskipTest

在這里插入圖片描述

b. 安裝jar包到本地Maven倉(cāng)庫(kù)
mvn clean install -DskipTest
在這里插入圖片描述

0x04 項(xiàng)目實(shí)戰(zhàn)

1. 構(gòu)建項(xiàng)目

a. 可參考此文章的0x01 新建maven工程
Java API實(shí)現(xiàn)HDFS的相關(guān)操作

2. 引入依賴

a. 引入依賴(如果沒(méi)有在安裝此jar到本地倉(cāng)庫(kù)是無(wú)法引入的)

<!-- 添加UserAgent的依賴 -->
<dependency>
    <groupId>com.kumkee</groupId>
    <artifactId>UserAgentParser</artifactId>
    <version>0.0.1</version>
</dependency>

b. 完整的依賴

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.shaonaiyi.hadoop</groupId>
    <artifactId>hadoop-learning</artifactId>
    <version>1.0</version>

    <name>hadoop-learning</name>
    <!-- FIXME change it to the project's website -->
    <url>http://www.example.com</url>

    <properties>
        <hadoop-version>2.7.5</hadoop-version>
    </properties>

    <dependencies>

        <!--添加hadoop依賴-->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>${hadoop-version}</version>
        </dependency>

        <!-- 添加UserAgent的依賴 -->
        <dependency>
            <groupId>com.kumkee</groupId>
            <artifactId>UserAgentParser</artifactId>
            <version>0.0.1</version>
        </dependency>

        <!--添加單元測(cè)試依賴-->
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.11</version>
            <scope>test</scope>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <archive>
                        <manifest>
                            <mainClass></mainClass>
                        </manifest>
                    </archive>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
            </plugin>
        </plugins>
    </build>

</project>
3. 編寫測(cè)試代碼

a. 在java的測(cè)試目錄創(chuàng)建一個(gè)測(cè)試包(com.shaonaiyi.hadoop.project):

在這里插入圖片描述

b. 新建UserAgentTest測(cè)試類:

package com.shaonaiyi.hadoop.project;

import com.kumkee.userAgent.UserAgent;
import com.kumkee.userAgent.UserAgentParser;

/**
 * @Auther: 邵奈一
 * @Date: 2019/03/27 下午 2:45
 * @Description: UserAgent解析測(cè)試類
 */
public class UserAgentTest {

    public static void main(String[] args) {
        String agentSource = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36";
        UserAgentParser userAgentParser = new UserAgentParser();
        UserAgent agent = userAgentParser.parse(agentSource);

        String browser = agent.getBrowser();
        String engine = agent.getEngine();
        String engineVersion = agent.getEngineVersion();
        String os = agent.getOs();
        String platform = agent.getPlatform();
        boolean isMobile = agent.isMobile();
        String version = agent.getVersion();

        System.out.println("瀏覽器:" + browser);
        System.out.println("引擎:" + engine);
        System.out.println("引擎版本:" + engineVersion);
        System.out.println("操作系統(tǒng):" + os);
        System.out.println("平臺(tái):" + platform);
        System.out.println("是否為移動(dòng)設(shè)備:" + isMobile);
        System.out.println("版本號(hào):" + version);

    }

}

c. 執(zhí)行測(cè)試代碼,可看到結(jié)果:


在這里插入圖片描述
4. 編寫實(shí)戰(zhàn)代碼

a. 新建包


在這里插入圖片描述

b. 新建ParseUserAgentApp類:

package com.shaonaiyi.hadoop.project;

import com.kumkee.userAgent.UserAgent;
import com.kumkee.userAgent.UserAgentParser;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * @Auther: 邵奈一
 * @Date: 2019/03/27 下午 2:54
 * @Description: 使用MapReduce完成瀏覽器的訪問(wèn)次數(shù)統(tǒng)計(jì)
 */
public class ParseUserAgentApp {

    //Map類實(shí)現(xiàn)
    public static class MyMapper extends Mapper<LongWritable, Text, Text, LongWritable> {

        LongWritable one = new LongWritable(1);
        private UserAgentParser userAgentParser;

        @Override
        protected void setup(Context context) throws IOException, InterruptedException {
            userAgentParser = new UserAgentParser();
        }

        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

            //每條日志信息
            String lines = value.toString();

            String agentSource = lines.substring(getCharacterPosition(lines, "\"", 7) + 1);
            UserAgent agent = userAgentParser.parse(agentSource);
            String brower = agent.getBrowser();

            context.write(new Text(brower), one);
        }

        @Override
        protected void cleanup(Context context) throws IOException, InterruptedException {
            userAgentParser = null;
        }
    }

    //Reduce類實(shí)現(xiàn)
    public static class MyReducer extends Reducer<Text, LongWritable, Text, LongWritable> {

        @Override
        protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {

            int sum = 0;
            for (LongWritable value: values){
                sum += value.get();
            }
            context.write(key, new LongWritable(sum));

        }
    }

    /**
     * 獲取指定字符串中指定標(biāo)識(shí)的字符串出現(xiàn)的索引位置
     * @param value 指定的字符串
     * @param operator 指定標(biāo)識(shí)
     * @param index 索引位置
     * @return 返回的索引位置
     */
    private static int getCharacterPosition(String value, String operator, int index){

        Matcher slashMatcher = Pattern.compile(operator).matcher(value);
        int matcherIndex = 0;
        while (slashMatcher.find()) {
            matcherIndex++;

            if (matcherIndex == index) {
                break;
            }
        }
        return slashMatcher.start();
    }


    public static void main(String[] args) throws Exception{

        Configuration configuration = new Configuration();

        // 若輸出路徑有內(nèi)容,則先刪除
        Path outputPath = new Path(args[1]);
        FileSystem fileSystem = FileSystem.get(configuration);
        if(fileSystem.exists(outputPath)){
            fileSystem.delete(outputPath, true);
            System.out.println("路徑存在,但已被刪除");
        }

        Job job = Job.getInstance(configuration, "ParseUserAgentApp");

        job.setJarByClass(ParseUserAgentApp.class);

        job.setMapperClass(MyMapper.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(LongWritable.class);

        job.setReducerClass(MyReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(LongWritable.class);

        FileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        System.exit(job.waitForCompletion(true) ? 0 : 1);

    }

}

ps:代碼其實(shí)是此教程MapReduce入門例子之單詞計(jì)數(shù) 改寫來(lái)的,請(qǐng)查看學(xué)習(xí)!

c. 打包(怎么打包都可以,此處教一種新的打包方式)
mvn assembly:assembly

在這里插入圖片描述

0x05 結(jié)果展示

1. 上傳項(xiàng)目到服務(wù)器

a. 打包好項(xiàng)目后,可以在target目錄看到有兩個(gè)包:
hadoop-learning-1.0.jar:沒(méi)有引入外部依賴的jar包(在本地windows可以用而已)
hadoop-learning-1.0-jar-with-dependencies.jar:引入了外部依賴的jar包(含小工具jar包)

說(shuō)明:因?yàn)槲覀兊姆?wù)器沒(méi)有我們剛開(kāi)始時(shí)打包的UserAgentParser-0.0.1.jar包,我們只是在我們windows系統(tǒng)本地打了jar包,并安裝到了我們windows系統(tǒng)的Maven倉(cāng)庫(kù),實(shí)際上我們的服務(wù)器上沒(méi)有的,所以要將hadoop-learning-1.0-jar-with-dependencies.jar此包拷貝到服務(wù)器使用,不然的話也要在服務(wù)器的Maven倉(cāng)庫(kù)安裝好UserAgentParser-0.0.1.jar才行。
b. 上傳項(xiàng)目到服務(wù)器

[hadoop-sny@master mr]$ pwd
/home/hadoop-sny/mr
[hadoop-sny@master mr]$ ll
total 352752
-rw-rw-r--. 1 hadoop-sny hadoop-sny 321100030 Dec 13 18:51 big_file_again.txt
-rw-rw-r--. 1 hadoop-sny hadoop-sny      8837 Mar 22 19:34 hadoop-learning-1.0.jar
-rw-rw-r--. 1 hadoop-sny hadoop-sny  39193853 Mar 27 15:34 hadoop-learning-1.0-jar-with-dependencies.jar
-rw-rw-r--. 1 hadoop-sny hadoop-sny    903971 Dec 19 10:45 mapreduce-course-1.0-SNAPSHOT.jar
-rw-rw-r--. 1 hadoop-sny hadoop-sny        30 Dec 19 17:12 small_file.txt
2. 上傳日志文件到HDFS

a. 日志樣式,如果沒(méi)有文件,可以自己復(fù)制多幾次寫成文件來(lái)使用:

183.162.52.7 - - [10/Nov/2016:00:01:02 +0800] "POST /api3/getadv HTTP/1.1" 200 813 "www.imooc.com" "-" cid=0&timestamp=1478707261865&uid=2871142&marking=androidbanner&secrect=a6e8e14701ffe9f6063934780d9e2e6d&token=f51e97d1cb1a9caac669ea8acc162b96 "mukewang/5.0.0 (Android 5.1.1; Xiaomi Redmi 3 Build/LMY47V),Network 2G/3G" "-" 10.100.134.244:80 200 0.027 0.027
10.100.0.1 - - [10/Nov/2016:00:01:02 +0800] "HEAD / HTTP/1.1" 301 0 "117.121.101.40" "-" - "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" - - - 0.000
117.35.88.11 - - [10/Nov/2016:00:01:02 +0800] "GET /article/ajaxcourserecommends?id=124 HTTP/1.1" 200 2345 "www.imooc.com" "http://www.imooc.com/code/1852" - "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36" "-" 10.100.136.65:80 200 0.616 0.616
182.106.215.93 - - [10/Nov/2016:00:01:02 +0800] "POST /socket.io/1/ HTTP/1.1" 200 94 "chat.mukewang.com" "-" - "android-websockets-2.0" "-" 10.100.15.239:80 200 0.004 0.004
10.100.0.1 - - [10/Nov/2016:00:01:02 +0800] "HEAD / HTTP/1.1" 301 0 "117.121.101.40" "-" - "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" - - - 0.000
183.162.52.7 - - [10/Nov/2016:00:01:02 +0800] "POST /api3/userdynamic HTTP/1.1" 200 19501 "www.imooc.com" "-" cid=0&timestamp=1478707261847&uid=2871142&touid=2871142&page=1&secrect=a6e8e14701ffe9f6063934780d9e2e6d&token=3837a5bf27ea718fe18bda6c53fbbc14 "mukewang/5.0.0 (Android 5.1.1; Xiaomi Redmi 3 Build/LMY47V),Network 2G/3G" "-" 10.100.136.65:80 200 0.195 0.195
10.100.0.1 - - [10/Nov/2016:00:01:02 +0800] "HEAD / HTTP/1.1" 301 0 "117.121.101.40" "-" - "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" - - - 0.000
114.248.161.26 - - [10/Nov/2016:00:01:02 +0800] "POST /api3/getcourseintro HTTP/1.1" 200 2510 "www.imooc.com" "-" cid=283&secrect=86b720f312c2b25da3b20e59e7c89780&timestamp=1478707261951&token=4c144b3f4314178b9527d1e91ecc0fac&uid=3372975 "mukewang/5.0.2 (iPhone; iOS 8.4.1; Scale/2.00)" "-" 10.100.136.65:80 200 0.007 0.008
120.52.94.105 - - [10/Nov/2016:00:01:02 +0800] "POST /api3/getmediainfo_ver2 HTTP/1.1" 200 633 "www.imooc.com" "-" cid=608&secrect=e25994750eb2bbc7ade1a36708b999a5&timestamp=1478707261945&token=9bbdba949aec02735e59e0868b538e19&uid=4203162 "mukewang/5.0.2 (iPhone; iOS 10.0.1; Scale/3.00)" "-" 10.100.136.65:80 200 0.049 0.049
10.100.0.1 - - [10/Nov/2016:00:01:02 +0800] "HEAD / HTTP/1.1" 301 0 "117.121.101.40" "-" - "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" - - - 0.000
112.10.136.45 - - [10/Nov/2016:00:01:02 +0800] "POST /socket.io/1/ HTTP/1.1" 200 94 "chat.mukewang.com" "-" - "android-websockets-2.0" "-" 10.100.15.239:80 200 0.006 0.006
211.162.33.31 - - [10/Nov/2016:00:01:02 +0800] "GET /u/card HTTP/1.1" 200 331 "www.imooc.com" "http://www.imooc.com/code/2053" - "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36" "-" 10.100.136.65:80 200 0.371 0.371
116.22.196.70 - - [10/Nov/2016:00:01:02 +0800] "POST /course/ajaxmediauser HTTP/1.1" 200 54 "www.imooc.com" "http://www.imooc.com/code/3500" mid=3500&time=60 "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.22 Safari/537.36 SE 2.X MetaSr 1.0" "-" 10.100.134.244:80 200 0.026 0.026
10.100.0.1 - - [10/Nov/2016:00:01:02 +0800] "HEAD / HTTP/1.1" 301 0 "117.121.101.40" "-" - "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" - - - 0.000
113.47.86.12 - - [10/Nov/2016:00:01:02 +0800] "GET /socket.io/1/websocket/eHBhkZC47oY64iLMMeXm HTTP/1.1" 101 125 "chat.mukewang.com" "-" - "-" "-" 10.100.15.239:80 101 277.433 277.433
119.130.229.90 - - [10/Nov/2016:00:01:02 +0800] "POST /course/ajaxmediauser HTTP/1.1" 200 54 "www.imooc.com" "http://www.imooc.com/code/547" mid=547&time=60 "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36" "-" 10.100.136.65:80 200 0.021 0.021
10.100.0.1 - - [10/Nov/2016:00:01:02 +0800] "HEAD / HTTP/1.1" 301 0 "117.121.101.40" "-" - "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" - - - 0.000
120.52.94.105 - - [10/Nov/2016:00:01:02 +0800] "POST /api3/getrelevantcourse HTTP/1.1" 200 774 "www.imooc.com" "-" cid=608&secrect=e25994750eb2bbc7ade1a36708b999a5&timestamp=1478707262003&token=2b865e78535436df02fd3f986bb0cc08&uid=4203162 "mukewang/5.0.2 (iPhone; iOS 10.0.1; Scale/3.00)" "-" 10.100.136.65:80 200 0.048 0.048
10.100.0.1 - - [10/Nov/2016:00:01:02 +0800] "HEAD / HTTP/1.1" 301 0 "117.121.101.40" "-" - "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" - - - 0.000
183.44.115.163 - - [10/Nov/2016:00:01:02 +0800] "POST /api3/savemediafinish HTTP/1.1" 200 103 "www.imooc.com" "-" is_offline=0&time=0&mid=2312&secrect=cc8506ee27115cd3c9d617730ea600d9&cid=0&plat_id=5&timestamp=1478707261086&uid=4356276&stay_time=0&token=22e4a2ec2c40a7c4375651c5020e7023 "mukewang/5.0.1 (Android 5.0.2; Xiaomi Redmi Note 2 Build/LRX22G),Network WIFI" "-" 10.100.136.64:80 200 0.068 0.068
211.162.33.31 - - [10/Nov/2016:00:01:02 +0800] "POST /course/ajaxusermediasstatus?cid=9 HTTP/1.1" 200 2954 "www.imooc.com" "http://www.imooc.com/code/2053" - "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36" "-" 10.100.136.64:80 200 0.030 0.030
218.58.205.220 - - [10/Nov/2016:00:01:02 +0800] "HEAD /favicon.ico HTTP/1.1" 404 0 "chat.mukewang.com" "-" - "Go-http-client/1.1" "-" 10.100.15.239:80 404 0.009 0.009
114.246.57.116 - - [10/Nov/2016:00:01:02 +0800] "POST /api3/userinfo HTTP/1.1" 200 151 "www.imooc.com" "-" secrect=9455e4679d68f107477a27d69cdf753c&timestamp=1478707262002&token=73bdcb218e48acd4869826afa320baf4&uid=4132795&uuid=0dd9c37bf4ac75031158349738b7612b "mukewang/5.0.2 (iPhone; iOS 10.1.1; Scale/2.00)" "-" 10.100.136.64:80 200 0.070 0.071
218.58.205.245 - - [10/Nov/2016:00:01:02 +0800] "HEAD /favicon.ico HTTP/1.1" 404 0 "chat.mukewang.com" "-" - "Go-http-client/1.1" "-" 10.100.15.239:80 404 0.002 0.002
10.100.0.1 - - [10/Nov/2016:00:01:02 +0800] "HEAD / HTTP/1.1" 301 0 "117.121.101.40" "-" - "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" - - - 0.000
10.100.0.1 - - [10/Nov/2016:00:01:02 +0800] "HEAD / HTTP/1.1" 301 0 "117.121.101.40" "-" - "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" - - - 0.000
112.253.38.168 - - [10/Nov/2016:00:01:02 +0800] "HEAD /favicon.ico HTTP/1.1" 404 0 "chat.mukewang.com" "-" - "Go-http-client/1.1" "-" 10.100.15.239:80 404 0.024 0.024
218.58.205.204 - - [10/Nov/2016:00:01:02 +0800] "HEAD /favicon.ico HTTP/1.1" 404 0 "chat.mukewang.com" "-" - "Go-http-client/1.1" "-" 10.100.15.239:80 404 0.023 0.023
112.253.38.159 - - [10/Nov/2016:00:01:02 +0800] "HEAD /favicon.ico HTTP/1.1" 404 0 "chat.mukewang.com" "-" - "Go-http-client/1.1" "-" 10.100.15.239:80 404 0.024 0.024
218.58.205.252 - - [10/Nov/2016:00:01:02 +0800] "HEAD /favicon.ico HTTP/1.1" 404 0 "chat.mukewang.com" "-" - "Go-http-client/1.1" "-" 10.100.15.239:80 404 0.025 0.025
119.184.176.131 - - [10/Nov/2016:00:01:02 +0800] "HEAD /favicon.ico HTTP/1.1" 404 0 "chat.mukewang.com" "-" - "Go-http-client/1.1" "-" 10.100.15.239:80 404 0.003 0.003
223.104.31.75 - - [10/Nov/2016:00:01:02 +0800] "GET /socket.io/1/websocket/szGk1G7hrpIe6RWHMfLK HTTP/1.1" 101 91 "chat.mukewang.com" "-" - "-" "-" 10.100.15.239:80 101 30.068 30.068
218.58.205.216 - - [10/Nov/2016:00:01:02 +0800] "HEAD /favicon.ico HTTP/1.1" 404 0 "chat.mukewang.com" "-" - "Go-http-client/1.1" "-" 10.100.15.239:80 404 0.022 0.022
183.162.52.7 - - [10/Nov/2016:00:01:02 +0800] "POST /api3/beta HTTP/1.1" 200 16950 "www.imooc.com" "-" cid=0&timestamp=1478707261842&uid=2871142&secrect=a6e8e14701ffe9f6063934780d9e2e6d&token=4ea00393c5ac3588c5317cf9f28013fa "mukewang/5.0.0 (Android 5.1.1; Xiaomi Redmi 3 Build/LMY47V),Network 2G/3G" "-" 10.100.136.65:80 200 0.377 0.377
10.100.0.1 - - [10/Nov/2016:00:01:02 +0800] "HEAD / HTTP/1.1" 301 0 "117.121.101.40" "-" - "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" - - - 0.000
106.39.41.166 - - [10/Nov/2016:00:01:02 +0800] "POST /course/ajaxmediauser/ HTTP/1.1" 200 54 "www.imooc.com" "http://www.imooc.com/video/8701" mid=8701&time=120.0010000000002&learn_time=16.1 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.22 Safari/537.36 SE 2.X MetaSr 1.0" "-" 10.100.136.64:80 200 0.016 0.016
10.100.0.1 - - [10/Nov/2016:00:01:02 +0800] "HEAD / HTTP/1.1" 301 0 "117.121.101.40" "-" - "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" - - - 0.000
183.162.52.7 - - [10/Nov/2016:00:01:02 +0800] "POST /api3/searchindex HTTP/1.1" 200 1484 "www.imooc.com" "-" cid=0&words=premiere&timestamp=1478707261876&uid=2871142&secrect=a6e8e14701ffe9f6063934780d9e2e6d&token=1b4fcde08cb054e9077b2f316a7da0b0 "mukewang/5.0.0 (Android 5.1.1; Xiaomi Redmi 3 Build/LMY47V),Network 2G/3G" "-" 10.100.136.65:80 200 0.110 0.110
39.186.247.142 - - [10/Nov/2016:00:01:02 +0800] "GET /video/3237 HTTP/1.1" 200 7227 "www.imooc.com" "http://www.imooc.com/ceping/4191" - "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36" "-" 10.100.136.64:80 200 0.198 0.198
113.140.11.123 - - [10/Nov/2016:00:01:02 +0800] "POST /course/ajaxmediauser/ HTTP/1.1" 200 54 "www.imooc.com" "http://www.imooc.com/video/5915/0" mid=5915&time=60.01200000000006&learn_time=284.9 "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393" "-" 10.100.134.244:80 200 0.029 0.029
10.100.0.1 - - [10/Nov/2016:00:01:02 +0800] "HEAD / HTTP/1.1" 301 0 "117.121.101.40" "-" - "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" - - - 0.000

b. 上傳日志到HDFS根目錄
hadoop fs -put access.log /files

3. 執(zhí)行項(xiàng)目

a. 請(qǐng)確保HDFS與YARN已啟動(dòng),并確保HDFS根目錄有access.log文件
b. 進(jìn)入jar包所在目錄(此處為:/home/hadoop-sny/mr
cd /home/hadoop-sny/mr
c. 執(zhí)行:
hadoop jar ./hadoop-learning-1.0-jar-with-dependencies.jar com.shaonaiyi.hadoop.project.ParseUserAgentApp /files/access.log /projectout
d. 查看統(tǒng)計(jì)結(jié)果
hadoop fs -cat /projectout/*
結(jié)果顯示:

[hadoop-sny@master mr]$ hadoop fs -cat /projectout/*
Chrome  2775
Firefox 327
MSIE    78
Safari  115
Unknown 6705
[hadoop-sny@master mr]$

0xFF 總結(jié)

  1. 同樣,也可以進(jìn)入YARN的WebUI界面:http://master:8088,查看執(zhí)行的作業(yè)
  2. 實(shí)戰(zhàn)的代碼其實(shí)是此教程:MapReduce入門例子之單詞計(jì)數(shù) 的進(jìn)階版,請(qǐng)?zhí)D(zhuǎn)學(xué)習(xí),一步一步升級(jí)打怪!
  3. 思考題:請(qǐng)嘗試挖掘更多的業(yè)務(wù),實(shí)現(xiàn)更多的需求,此處只是統(tǒng)計(jì)了一個(gè)瀏覽器的次數(shù),其實(shí)通過(guò)我們的測(cè)試類,可以發(fā)現(xiàn)我們的業(yè)務(wù)不只有統(tǒng)計(jì)日志中瀏覽器出現(xiàn)的次數(shù)。

作者簡(jiǎn)介:邵奈一
大學(xué)大數(shù)據(jù)講師、大學(xué)市場(chǎng)洞察者、專欄編輯
公眾號(hào)、微博、CSDN邵奈一

復(fù)制粘貼玩轉(zhuǎn)大數(shù)據(jù)系列專欄已經(jīng)更新完成,請(qǐng)?zhí)D(zhuǎn)學(xué)習(xí)!

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容