Introduction of Hadoop/ MapReduce

What is MapReduce ?

Parallel programming model for big data processing:

split data> chunks

define steps to process chunks

process the chunks parallelly

? ? Hadoop is a platform implements MapReduce .?

1. Map

<key1, value1> ?-> <key2, value2>

eg: <line#, text string > ? -> < word, count>

After mapping, the oupput is passed to Reduce part

2. Reduce

Merge/Reduce the output of Mapping phase, which is optional .

The output of MapReduce could be printed, Summed, Counted , loaded to DB or sent to next MapReduce job



Idea: MapReduce , massive unstructured data storage

Physical: Jave classes for and The Hadoop Distributed file System

Hadoop Operational Modes

Java MapReduce Mode: read record incrementally

Streaming Mode: Any language, input can be a line or stream


MapReduce and HDFS

Query Languages for Hadoop

Builds on core Hadoop to enhanve the development and manpulation of Hadoop cluster

Pig:Data flow language and execution enviroment

Hive(HiveQL) Query language based on SQL for building MapReduced jobs

HBase ?Column oriented database?


Pig(Data flow language in Latin)

2 Execution environment modes:

Local flie system

MapReduce in Hadoop environment

Suitable for large dataset and batch processing

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容