DataSet是具有強(qiáng)類型的數(shù)據(jù)集合,需要提供對(duì)應(yīng)的類型信息。
創(chuàng)建
-
創(chuàng)建一個(gè)樣例類
case class Person(name: String, age: Long) -
創(chuàng)建DataSet
val personDs = Seq(Person("adam"), 20).toDS personDs.show /* +----+---+ |name|age| +----+---+ |adam| 20| +----+---+ */
RDD轉(zhuǎn)換為DataSet
SparkSQL能夠自動(dòng)將包含有樣例類的RDD轉(zhuǎn)換成為DataFrame,樣例類定義了表的結(jié)構(gòu),樣例類屬性通過(guò)反射形成了表的列。
-
創(chuàng)建一個(gè)RDD
val rdd = sc.textFile("E:\\IdeaProjects\\spark-demo\\files\\people.txt") -
創(chuàng)建一個(gè)樣例類
case class Person(name: String, age: Long) -
將RDD轉(zhuǎn)換為DataSet
val personRdd = rdd.map(x => { Person(x.split(",")(0), x.split(",")(1).trim.toLong) }) val personDs = personRdd.toDS() personDs.show /* +----+---+ |name|age| +----+---+ |adam| 18| |brad| 21| |carl| 13| +----+---+ */
DataSet轉(zhuǎn)換為RDD
調(diào)用rdd方法即可
-
創(chuàng)建DataSet
val personDs = Seq(Person("adam"), 20).toDS personDs.show /* +----+---+ |name|age| +----+---+ |adam| 20| +----+---+ */ -
將DataSet轉(zhuǎn)換為RDD
val personRdd = personDs.rdd
DataFrame與DataSet的互相操作
-
DataFrame轉(zhuǎn)換為DataSet
-
創(chuàng)建一個(gè)DataFrame
val df = spark.read.json("E:\\IdeaProjects\\spark-demo\\files\\test.json") -
創(chuàng)建一個(gè)樣例類
case class Person(id: Long, name: String, age: Long) -
將DataFrame轉(zhuǎn)換為DataSet
val ds = df.as[Person]
-
-
DataSet轉(zhuǎn)換為DataFrame
-
創(chuàng)建一個(gè)樣例類
case class Person(id: Long, name: String, age: Long) -
創(chuàng)建一個(gè)DataSet
val personDs = Seq(Person(1, "adam", 20)).toDS() -
導(dǎo)入隱式轉(zhuǎn)換
import spark.implicits._ -
將DataSet轉(zhuǎn)換為DataFrame
val personDf = personDs.toDF()
-