005 Top Advantages and Disadvantages of Hadoop 3
The objective of this tutorial is to discuss the advantages and disadvantages of Hadoop 3.0. As many changes are introduced in Hadoop 3.0 it has become a better product.
本教程的目的是討論 Hadoop 3.0 的優(yōu)缺點(diǎn).很多人Hadoop 3.0 中引入了變化它已經(jīng)成為更好的產(chǎn)品.
Hadoop is designed to store and manage a large amount of data. There are many advantages of Hadoop like it is free and open source, easy to use, its performance etc. but on the other hand, it has some weaknesses which we called as disadvantages.
Hadoop 旨在存儲(chǔ)和管理大量數(shù)據(jù).Hadoop 有許多優(yōu)點(diǎn),比如它是免費(fèi)的、開(kāi)源的、易于使用的、它的性能等.但是另一方面,它也有一些缺點(diǎn),我們稱(chēng)之為缺點(diǎn).
So, let’s start exploring the top advantages and disadvantages of Hadoop.
所以,讓我們開(kāi)始探索 Hadoop 的優(yōu)勢(shì)和劣勢(shì).
Top Advantages and Disadvantages of Hadoop 3
Advantages of Hadoop
Hadoop 的優(yōu)勢(shì)
Hadoop is easy to use, scalable and cost-effective. Along with this, Hadoop has many advantages. Here we are discussing the top 12 advantages of Hadoop. So, following are the pros of Hadoop that makes it so popular –
Hadoop 易于使用、可擴(kuò)展且經(jīng)濟(jì)高效.除此之外,Hadoop 還有許多優(yōu)點(diǎn).下面我們來(lái)討論一下 Hadoop 的 12 大優(yōu)勢(shì).所以,以下是 Hadoop 的優(yōu)點(diǎn),讓它如此受歡迎 --
Advantages of Hadoop
1. Varied Data Sources
1. 不同數(shù)據(jù)來(lái)源
Hadoop accepts a variety of data. Data can come from a range of sources like email conversation, social media etc. and can be of structured or unstructured form. Hadoop can derive value from diverse data. Hadoop can accept data in a text file, XML file, images, CSV files etc.
Hadoop 接受各種數(shù)據(jù).數(shù)據(jù)可以來(lái)自電子郵件對(duì)話(huà)、社交媒體等各種來(lái)源,也可以是結(jié)構(gòu)化或非結(jié)構(gòu)化的形式.Hadoop 可以從不同的數(shù)據(jù)中獲得價(jià)值.Hadoop 可以接受文本文件、 XML 文件、圖像、 CSV 文件等中的數(shù)據(jù).
2. Cost-effective
2. 性?xún)r(jià)比高
Hadoop is an economical solution as it uses a cluster of commodity hardware to store data. Commodity hardware is cheap machines hence the cost of adding nodes to the framework is not much high. In Hadoop 3.0 we have only 50% of storage overhead as opposed to 200% in Hadoop2.x. This requires less machine to store data as the redundant data decreased significantly.
Hadoop 是一種經(jīng)濟(jì)的解決方案,因?yàn)樗褂靡唤M商品硬件來(lái)存儲(chǔ)數(shù)據(jù).商品硬件是廉價(jià)的機(jī)器,因此在框架中添加節(jié)點(diǎn)的成本并不高.在 Hadoop 3.0 中,我們的存儲(chǔ)開(kāi)銷(xiāo)只有 50%,而在 Hadoop2.x 中只有 200%.由于冗余數(shù)據(jù)顯著減少,這需要更少的機(jī)器來(lái)存儲(chǔ)數(shù)據(jù).
3. Performance
3. 性能
Hadoop with its distributed processing and distributed storage architecture processes huge amounts of data with high speed. Hadoop even defeated supercomputer the fastest machine in 2008. It divides the input data file into a number of blocks and stores data in these blocks over several nodes. It also divides the task that user submits into various sub-tasks which assign to these worker nodes containing required data and these sub-task run in parallel thereby improving the performance.
Hadoop 以其分布式處理和分布式存儲(chǔ)架構(gòu),高速處理海量數(shù)據(jù).Hadoop 甚至在 2008年擊敗了超級(jí)計(jì)算機(jī).它將輸入數(shù)據(jù)文件分成多個(gè)塊,并通過(guò)幾個(gè)節(jié)點(diǎn)將數(shù)據(jù)存儲(chǔ)在這些塊中.它還將用戶(hù)提交的任務(wù)劃分為各種子任務(wù),這些子任務(wù)分配給包含所需數(shù)據(jù)的這些工作節(jié)點(diǎn),這些子任務(wù)并行運(yùn)行,從而提高了性能.
4. Fault-Tolerant
4. 容錯(cuò)
In Hadoop 3.0 fault tolerance is provided by erasure coding. For example, 6 data blocks produce 3 parity blocks by using erasure coding technique, so HDFS stores a total of these 9 blocks. In event of failure of any node the data block affected can be recovered by using these parity blocks and the remaining data blocks.
在 Hadoop 3.0 中,擦除編碼提供了容錯(cuò)能力.例如,使用擦除編碼技術(shù),6 個(gè)數(shù)據(jù)塊產(chǎn)生 3 個(gè)奇偶校驗(yàn)塊,因此 HDFS 總共存儲(chǔ)了這 9 個(gè)塊.如果任何節(jié)點(diǎn)發(fā)生故障,可以使用這些奇偶校驗(yàn)塊和剩余的數(shù)據(jù)塊恢復(fù)受影響的數(shù)據(jù)塊.
5. Highly Available
5. 高可用
In Hadoop 2.x, HDFS architecture has a single active NameNode and a single Standby NameNode, so if a NameNode goes down then we have standby NameNode to count on. But Hadoop 3.0 supports multiple standby NameNode making the system even more highly available as it can continue functioning in case if two or more NameNodes crashes.
在 Hadoop 2.X 中 HDFS 架構(gòu)有一個(gè)活動(dòng)的 NameNode 和一個(gè)備用 NameNode,所以如果 NameNode 關(guān)閉,我們就可以依靠備用 NameNode.但是 Hadoop 3.0 支持多個(gè)備用 NameNode,這使得系統(tǒng)的可用性更高,因?yàn)槿绻麅蓚€(gè)或更多 NameNode 崩潰,它可以繼續(xù)運(yùn)行.
6. Low Network Traffic
6. 低網(wǎng)絡(luò)帶寬
In Hadoop, each job submitted by the user is split into a number of independent sub-tasks and these sub-tasks are assigned to the data nodes thereby moving a small amount of code to data rather than moving huge data to code which leads to low network traffic.
在 Hadoop 中用戶(hù)提交的每一項(xiàng)作業(yè)都被拆分成若干獨(dú)立的子任務(wù),并將這些子任務(wù)分配給數(shù)據(jù)節(jié)點(diǎn),從而將少量代碼移動(dòng)到數(shù)據(jù)中,而不是移動(dòng)大量代碼導(dǎo)致網(wǎng)絡(luò)流量低的數(shù)據(jù)到代碼.
7. High Throughput
7. 高吞吐
Throughput means job done per unit time. Hadoop stores data in a distributed fashion which allows using distributed processing with ease. A given job gets divided into small jobs which work on chunks of data in parallel thereby giving high throughput.
吞吐量是指單位時(shí)間內(nèi)完成的工作.Hadoop 以分布式方式存儲(chǔ)數(shù)據(jù),這使得使用分布式處理變得更加容易.給定的作業(yè)被劃分為并行處理數(shù)據(jù)塊的小作業(yè),從而提供高吞吐量.
8. Open Source
8..開(kāi)源
Hadoop is an open source technology i.e. its source code is freely available. We can modify the source code to suit a specific requirement.
Hadoop 是一種開(kāi)源技術(shù),它的源代碼是免費(fèi)的.我們可以根據(jù)特定的需求修改源代碼.
9. Scalable
9. 可擴(kuò)展
Hadoop works on the principle of horizontal scalability i.e. we need to add the entire machine to the cluster of nodes and not change the configuration of a machine like adding RAM, disk and so on which is known as vertical scalability. Nodes can be added to Hadoop cluster on the fly making it a scalable framework.
Hadoop 的工作原理是橫向可擴(kuò)展性,即我們需要將整個(gè)機(jī)器添加到節(jié)點(diǎn)集群中,而不是像添加 RAM 一樣改變機(jī)器的配置, 稱(chēng)為垂直可擴(kuò)展性的磁盤(pán)等.可以將節(jié)點(diǎn)添加到Hadoop 集群使它成為一個(gè)可擴(kuò)展的框架.
10. Ease of use
10 、使用方便
The Hadoop framework takes care of parallel processing, MapReduce programmers does not need to care for achieving distributed processing, it is done at the backend automatically.
的Hadoop 框架MapReduce 程序員不需要關(guān)心分布式處理的實(shí)現(xiàn),而是在后端自動(dòng)完成.
11. Compatibility
11.兼容性
Most of the emerging technology of Big Data is compatible with Hadoop like Spark, Flink etc. They have got processing engines which work over Hadoop as a backend i.e. We use Hadoop as data storage platforms for them.
大部分的大數(shù)據(jù)的新興技術(shù)與 Spark 、 Flink 等 Hadoop 兼容.他們有處理引擎,可以在 Hadoop 上作為后端工作,即我們將 Hadoop 用作他們的數(shù)據(jù)存儲(chǔ)平臺(tái).
12. Multiple Languages Supported
12.支持多種語(yǔ)言
Developers can code using many languages on Hadoop like C, C++, Perl, Python, Ruby, and Groovy.
開(kāi)發(fā)人員可以在 Hadoop 上使用多種語(yǔ)言編寫(xiě)代碼,如 C 、 C + + 、 Perl 、PythonRuby 和 Groovy.
2. Disadvantages of Hadoop
2. Hadoop 的缺點(diǎn)
Disadvantages of Hadoop
1. Issue With Small Files
1. 小文件
Hadoop is suitable for a small number of large files but when it comes to the application which deals with a large number of small files, Hadoop fails here. A small file is nothing but a file which is significantly smaller than Hadoop’s block size which can be either 128MB or 256MB by default. These large number of small files overload the Namenode as it stores namespace for the system and makes it difficult for Hadoop to function.
Hadoop 適用于少量大文件,但是當(dāng)涉及到處理大量小文件的應(yīng)用程序時(shí),Hadoop 在這里失敗了.一個(gè)小文件只不過(guò)是一個(gè)比 Hadoop 的塊大小小得多的文件,默認(rèn)情況下,塊大小可以是 128 MB 或 256 MB.當(dāng) Namenode 為系統(tǒng)存儲(chǔ)命名空間時(shí),這些大量的小文件會(huì)過(guò)載 Namenode,這使得 Hadoop 很難運(yùn)行.
2. Vulnerable By Nature
2. 易受攻擊的性質(zhì)
Hadoop is written in** Java which is a widely used programming language** hence it is easily exploited by cyber criminals which makes Hadoop vulnerable to security breaches.
Hadoop 是用廣泛使用的編程語(yǔ)言 Java因此,它很容易被網(wǎng)絡(luò)罪犯利用,這使得 Hadoop 很容易受到安全漏洞的攻擊.
3. Processing Overhead
3. 處理開(kāi)銷(xiāo)
In Hadoop, the data is read from the disk and written to the disk which makes read/write operations very expensive when we are dealing with tera and petabytes of data. Hadoop cannot do in-memory calculations hence it incurs processing overhead.
在 Hadoop 中,數(shù)據(jù)從磁盤(pán)讀取并寫(xiě)入磁盤(pán),這使得當(dāng)我們處理 tera 和 pb 級(jí)數(shù)據(jù)時(shí),讀/寫(xiě)操作非常昂貴.Hadoop 無(wú)法進(jìn)行內(nèi)存計(jì)算,因此會(huì)帶來(lái)處理開(kāi)銷(xiāo).
4. Supports Only Batch Processing
4. 僅支持批處理
At the core, Hadoop has a batch processing engine which is not efficient in stream processing. It cannot produce output in real-time with low latency. It only works on data which we collect and store in a file in advance before processing.
Hadoop 的核心是批處理引擎,在流處理方面效率不高.它不能在低延遲的情況下實(shí)時(shí)產(chǎn)生輸出.它只適用于我們?cè)谔幚碇笆占⒋鎯?chǔ)在文件中的數(shù)據(jù).
5. Iterative Processing
5. 迭代處理
Hadoop cannot do iterative processing by itself. Machine learning or iterative processing has a cyclic data flow whereas Hadoop has data flowing in a chain of stages where output on one stage becomes the input of another stage.
Hadoop 本身無(wú)法進(jìn)行迭代處理.機(jī)器學(xué)習(xí) 或者迭代處理有一個(gè)循環(huán)數(shù)據(jù)流,而 Hadoop 有一個(gè)階段鏈中的數(shù)據(jù)流,其中一個(gè)階段的輸出成為另一個(gè)階段的輸入.
6. Security
6. 安全
For security, Hadoop uses Kerberos authentication which is hard to manage. It is missing encryption at storage and network levels which are a major point of concern.
為了安全,Hadoop 使用難以管理的 Kerberos 身份驗(yàn)證.它在存儲(chǔ)和網(wǎng)絡(luò)層面缺少加密,這是一個(gè)主要關(guān)注點(diǎn).
So, this was all about Hadoop Pros and Cons. Hope you liked our explanation.
所以,這都是關(guān)于 Hadoop 的利弊.希望你喜歡我們的解釋
Summary – Advantages and Disadvantages of Hadoop
總結(jié)-Hadoop 的優(yōu)缺點(diǎn)
Every software used by the industry comes with its own set of drawbacks and benefits. If the software is essential for the organization then one can exploit the benefits and take measures to minimize the faults. We can see that Hadoop has benefits which outweigh its shortcomings making it a strong solution to Big Data needs. Still, if you have any query regarding Hadoop advantages and disadvantages, ask through the comments.
該行業(yè)使用的每一款軟件都有自己的缺點(diǎn)和好處.如果軟件對(duì)組織來(lái)說(shuō)是必不可少的,那么人們可以利用這些好處,采取措施將故障降至最低.我們可以看到 Hadoop 的好處大于它的缺點(diǎn),這使得它成為一個(gè)強(qiáng)大的解決方案 大數(shù)據(jù)需求.盡管如此,如果你對(duì) Hadoop 的優(yōu)缺點(diǎn)有任何疑問(wèn),請(qǐng)通過(guò)評(píng)論詢(xún)問(wèn).
https://data-flair.training/blogs/advantages-and-disadvantages-of-hadoop



