用戶行為日志 信令數據 cell mapWithState DStream 整合RDD == transform 數據一:日志信息 DStream domain,traff...
Spark Streaming 基于Spark之上的流處理 流:source ==> compute ==> store 離線是特殊的流 letting you write ...
Function functions.scala hobbies.txt alice jogging,Coding,cooking 3 lina travel,danc...
External Data Source API 外部數據源 MapReduce Hive Spark 加載數據 格式:json、parquet、text、jdbc........
DataFrame python pandas R RDD MapReduce DataFrame vs Dataset(1.6) DS: Java Scala DF: 4 ...
1.核心概念 broker: 進程 producer: 生產者 consumer: 消費者 topic: 主題 partitions: 分區(qū) (副本數) consumergr...
Spark SQL IOE SQL:schema + file select ... from xxx where..... SQL on Hadoop Hive Impal...
下載地址: Zookeeper: http://mirror.bit.edu.cn/apache/zookeeper/current/ Scala: http://www.s...
Kafka: 消息中間件 -->分布式流式平臺 MQ Redis Kafka Flume 生產者 source Broker channel 消費者 sink 正常部...
collect collect countByKey countByValue collectAsMap groupByKey vs reduceByKey val rdd=...
Spark on YARN 將spark作業(yè)提交到yarn上去執(zhí)行 spark僅僅作業(yè)一個客戶端 ./spark-submit \ --class org.apache.sp...
Application a driver program + executors SparkContext = application spark-shell ? appli...
x.y.z 1.6.1 2.3.1 2.2.2 RDD transformation: lazy map filter union flatMap mapPartition ...
Hadoop的HDFS HA、Yarn HA集群部署 1.HDFS NN SNN(secondary) 熱備 NN(active) 掛了 NN(standby)--》acti...
Hive高級第二部分: *****Hive:復雜數據類型、JDBC編程ZK: Compression壓縮比解壓速度1G的沒壓縮數據:1G的gzip壓縮數據:codec:我...
ZK 1) 高可用: HDFS/HBase/Spark HA2) API:ZK/Curator開發(fā):Java/Scala操作ZKKafka:offset可以存儲在ZK =...
python官網給出的編程規(guī)范 1.Use 4-space indentation, and no tabs. 2.Wrap lines so that they don’t...
anaconda3下載地址 官網:https://www.anaconda.com/download/ 百度云鏈接:https://pan.baidu.com/s/17jHe...
.../page_views/201808082008 .... .../page_views/201808082009 .... ./flume-ng agent \ --...