眾所周知,hadoop的核心有hdfs,mapReduce,之前8次的分享都是在將hdfs,那么從這次開始來(lái)分享一下mapReduce
MapReduce就是java程序,這一句話一出來(lái)讓我這個(gè)java程序員看到很容易接受,有木有???
沒錯(cuò),學(xué)習(xí)任何東西入門不是helloWorld嗎?怎么變成wordCount了呢?其實(shí)這里的wordCount就是helloWorld,先看看wordCount的代碼,如下
map類
package com.xmf.mr.wordCount;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.util.StringUtils;
import java.io.IOException;
import java.io.StringWriter;
/**
* Created by Administrator on 2018/4/16.
*/
public class WCMapper extends Mapper<LongWritable, Text, Text, LongWritable> {
//每讀一行數(shù)據(jù)就調(diào)用一次這個(gè)方法
//key這一行的起始偏移量
//value是這一行的文本內(nèi)容
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
//將這一行的內(nèi)容轉(zhuǎn)換為String
String line = value.toString();
//以空格切分
String[] words = StringUtils.split(line, ' ');
//遍歷單詞數(shù)組,輸出k-V
for (String word :words){
context.write(new Text(word),new LongWritable(1));
}
}
}
Reduce類
package com.xmf.mr.wordCount;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
/**
* Created by Administrator on 2018/4/16.
*/
public class WCReducer extends Reducer<Text, LongWritable, Text, LongWritable> {
//框架在mapper處理結(jié)束之后,將所有kv緩存起來(lái),進(jìn)行分組,然后傳遞一個(gè)組<key,values{}>,調(diào)用一次reduce
//<hello,{1,1,1,1,1}>
@Override
protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
long count =0;
for (LongWritable value : values){
count += value.get();
}
//輸出這個(gè)單詞的統(tǒng)計(jì)結(jié)果
context.write(key,new LongWritable(count));
}
}
啟動(dòng)類
package com.xmf.mr.wordCount;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
/**用來(lái)描述特定的作業(yè)
* 比如改作業(yè)使用那個(gè)類作為map,哪個(gè)作為reducer
* 還可以指定輸入數(shù)據(jù)路徑
* 還可以指定輸出文件路徑
* Created by Administrator on 2018/4/18.
*/
public class WCRunner {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
//System.setProperty("hadoop.home.dir", "D:\\hadoop-2.4.1\\hadoop-2.4.1");
Job job = Job.getInstance(conf);
//設(shè)置job所用的哪些類在哪里
job.setJarByClass(WCRunner.class);
job.setMapperClass(WCMapper.class);
job.setReducerClass(WCReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
//原始數(shù)據(jù)存放路徑
FileInputFormat.setInputPaths(job,new Path("hdfs://my01:9000/wc/srcdata"));
//輸出文件存放路徑
FileOutputFormat.setOutputPath(job,new Path("hdfs://my01:9000/wc/output"));
//將job提交給集群
job.waitForCompletion(true);
}
}
這個(gè)是我寫的一個(gè)WordCount,要在windows(本地)運(yùn)行需要修改很多東西,我已經(jīng)修改了,不懂得朋友可以評(píng)論,我會(huì)收到通知,及時(shí)給你解答,這里就不在贅述,那么先看看在linux環(huán)境中,用hadoop命令運(yùn)行的這種方式,這種方式不利于調(diào)試,入門嘛,先不管調(diào)試了,我們的目的很明確,就是對(duì)MR有一個(gè)直觀的認(rèn)識(shí).
記錄一下intellij idea怎么打jar包
第一步:

image.png
第二步:

image.png
第三步:

image.png
=====華麗的分割線====
繼續(xù),將上面的代碼打位jar包,發(fā)送到服務(wù)器上

image.png
數(shù)據(jù)準(zhǔn)備

image.png
數(shù)據(jù):

image.png
運(yùn)行
hadoop jar wordCount.jar com.xmf.mr.wordCount.WCRunner

image.png

image.png
已經(jīng)執(zhí)行完了,看看執(zhí)行結(jié)果

image.png
從結(jié)果可以看出來(lái)已經(jīng)統(tǒng)計(jì)出了word的數(shù)量
望指正,不吝賜教!