Spark Streaming 初見

環(huán)境:

[root@test spark]# uname -a
Linux test 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
[root@test spark]# cat /etc/issue
CentOS release 6.5 (Final)
[root@test ~]# ls
jdk-7u79-linux-x64.tar.gz  spark-1.6.0-bin-hadoop2.6.tgz

這里我假設(shè)你已經(jīng)安裝并且配置好了運行spark的環(huán)境,本文只記錄官網(wǎng)教程給出的Spark Streaming 的WordCount程序的一個python版本。

進(jìn)入安裝好的spark目錄中,這里我是

cd  /usr/local/spark

examples/src/main/python/streaming/下我們能看到各種數(shù)據(jù)接入方式的示例,這里我使用的是network_wordcount.py(因為這個看起來使用方法很easy)

官網(wǎng)也給了例子的使用方法

"""
Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
Usage: network_wordcount.py <hostname> <port>
<hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.

To run this on your local machine, you need to first run a Netcat server
$ nc -lk 9999
and then run the example
$ bin/spark-submit examples/src/main/python/streaming/network_wordcount.py localhost 9999
"""

即:我們首先要安裝nc(netcat)這個東西

  1. 下載netcat安裝包
wget http://sourceforge.net/projects/netcat/files/netcat/0.7.1/netcat-0.7.1-1.i386.rpm
  1. 執(zhí)行安裝: rpm -ihv netcat-0.7.1-1.i386.rpm
    這里報了如下錯誤:
rpm -ihv netcat-0.7.1-1.i386.rpm  
warning: netcat-0.7.1-1.i386.rpm: Header V3 DSA/SHA1 Signature, key ID b2d79fc1: NOKEY  
error: Failed dependencies:  
        libc.so.6 is needed by netcat-0.7.1-1.i386  
        libc.so.6(GLIBC_2.0) is needed by netcat-0.7.1-1.i386  
        libc.so.6(GLIBC_2.1) is needed by netcat-0.7.1-1.i386  
        libc.so.6(GLIBC_2.3) is needed by netcat-0.7.1-1.i386  
  1. 解決依賴包問題
[root@test streaming]# yum list glibc*
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.aliyun.com
 * epel: ftp.cuhk.edu.hk
 * extras: mirrors.aliyun.com
 * rpmforge: ftp.neowiz.com
 * updates: mirrors.aliyun.com
Installed Packages
glibc.i686                                          2.12-1.192.el6                                @base
glibc.x86_64                                        2.12-1.192.el6                                @base
glibc-common.x86_64                                 2.12-1.192.el6                                @base
glibc-devel.x86_64                                  2.12-1.192.el6                                @base
glibc-headers.x86_64                                2.12-1.192.el6                                @base
glibc-static.x86_64                                 2.12-1.192.el6                                @base
glibc-utils.x86_64                                  2.12-1.192.el6                                @base
Available Packages
glibc-devel.i686                                    2.12-1.192.el6                                base 
glibc-static.i686                                   2.12-1.192.el6                                base 
  1. 安裝依賴包:
yum install glibc.i686
  1. 再次執(zhí)行安裝:
rpm -ihv netcat-0.7.1-1.i386.rpm
warning: netcat-0.7.1-1.i386.rpm: Header V3 DSA/SHA1 Signature, key ID b2d79fc1: NOKEY  
Preparing...                ########################################### [100%]  
   1:netcat                 ########################################### [100%]  

安裝成功

  1. 執(zhí)行指令nc -lk 9999
    提示
nc: invalid option -- 'k'
Try `nc --help' for more information.

網(wǎng)上搜了一下解決辦法http://unix.stackexchange.com/questions/193579/nc-commands-k-option

S O L V E D The consultant installed netcat so I uninstalled netcat and then nc was not working. So I also removed and reinstalled nc again. Now -k option is working now Thanks for your helps – Murat Apr 1 '15 at 10:03
意思就是卸載了再重新安裝一遍,貌似是被netcat的一種指令裝重復(fù)了。

  1. 解決netcat問題
[root@test ~]# yum remove netcat
Loaded plugins: fastestmirror
Setting up Remove Process
Resolving Dependencies
--> Running transaction check
---> Package netcat.i386 0:0.7.1-1 will be erased
--> Finished Dependency Resolution

重新安裝:(這里要注意使用依賴包的名稱是nc)

[root@test ~]# yum install nc
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.aliyun.com
 * epel: mirror.premi.st
 * extras: mirrors.aliyun.com
 * rpmforge: ftp.neowiz.com
 * updates: mirrors.aliyun.com
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package nc.x86_64 0:1.84-24.el6 will be installed
--> Finished Dependency Resolution
  1. 執(zhí)行程序
    新建一個命令行窗口執(zhí)行以下指令:
[root@test spark]# nc -lk 9999

在剛才的窗口執(zhí)行指令(還是在spark主目錄下):

[root@test spark]# bin/spark-submit examples/src/main/python/streaming/network_wordcount.py localhost 9999
  1. 測試輸出

在nc 那端的窗口輸入:

hello nihao my name is xzp hello world!

spark程序顯示:

-------------------------------------------
Time: 2016-07-20 11:56:41
-------------------------------------------
(u'my', 1)
(u'is', 1)
(u'nihao', 1)
(u'world!', 1)
(u'xzp', 1)
(u'name', 1)
(u'hello', 2)

-------------------------------------------
Time: 2016-07-20 11:56:42
-------------------------------------------

-------------------------------------------
Time: 2016-07-20 11:56:43
-------------------------------------------

整個流程到這里就結(jié)束拉,接下來就是根據(jù)業(yè)務(wù)邏輯自己更改官方實例了,因為我司是通過RESTAPI方式調(diào)用從而獲取數(shù)據(jù),所以接下來的數(shù)據(jù)接口就會改成調(diào)用RESTAPI版本

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容