spark創(chuàng)建DataFrame

通過(guò)列表創(chuàng)建

# -*- coding: utf-8 -*-
from pyspark.sql import SparkSession

spark = SparkSession\
    .builder\
    .master("local")\
    .appName("create df")\
    .getOrCreate()

# 列表里是列表
list1 = [["Bom", 20, 97.6, 165],
         ["Alice", 23, 90.0, 160]]

df1 = spark.createDataFrame(list1, ["name", "age", "weight", "height"])
df1.show()
# +-----+---+------+------+
# | name|age|weight|height|
# +-----+---+------+------+
# |  Bom| 20|  97.6|   165|
# |Alice| 23|  90.0|   160|
# +-----+---+------+------+

# 列表里是元組
list2 = [("Bom", 20, 97.6, 165),
         ("Alice", 23, 90.0, 160)]

df2 = spark.createDataFrame(list2, ["name", "age", "weight", "height"])
df2.show()
# +-----+---+------+------+
# | name|age|weight|height|
# +-----+---+------+------+
# |  Bom| 20|  97.6|   165|
# |Alice| 23|  90.0|   160|
# +-----+---+------+------+

df2_no_header = spark.createDataFrame(list2)
df2_no_header.show()
# +-----+---+----+---+
# |   _1| _2|  _3| _4|
# +-----+---+----+---+
# |  Bom| 20|97.6|165|
# |Alice| 23|90.0|160|
# +-----+---+----+---+

# 列表里是字典
list3 = [{"name": "Bom", "age": 20, "weight": 97.6, "height": 165},
         {"name": "Alice", "age": 23, "weight": 90.0, "height": 160}]

df3 = spark.createDataFrame(list3)
df3.show()
# +---+------+-----+------+
# |age|height| name|weight|
# +---+------+-----+------+
# | 20|   165|  Bom|  97.6|
# | 23|   160|Alice|  90.0|
# +---+------+-----+------+

通過(guò)列表創(chuàng)建dataframe,列表里面可以是列表也可以是元組。

從json文件創(chuàng)建

json文件people.json:

{"name":"Michael"}
{"name":"Andy", "age":30}
{"name":"Justin", "age":19}

spark代碼:

# -*- coding: utf-8 -*-
from pyspark.sql import SparkSession

spark = SparkSession\
    .builder\
    .master("local")\
    .appName("create df from json")\
    .getOrCreate()

df = spark.read.json("file:///Users/zhi/Documents/pycharm/spark_project/spark_test/people.json")
df.show()

# +----+-------+
# | age|   name|
# +----+-------+
# |null|Michael|
# |  30|   Andy|
# |  19| Justin|
# +----+-------+

從字典創(chuàng)建

目前還沒(méi)有想到直接從字典轉(zhuǎn)的df的方式,現(xiàn)在只能借用pandas轉(zhuǎn)成df,再轉(zhuǎn)成spark的df格式:

# -*- coding: utf-8 -*-
from pyspark.sql import SparkSession
import pandas as pd

spark = SparkSession\
    .builder\
    .master("local")\
    .appName("create df")\
    .getOrCreate()

dict1 = {"name": ["Bom", "Alice"],
         "age": [20, 23],
         "weight": [97.6, 90.0],
         "height": [165, 160]}

df1 = pd.DataFrame(dict1)
spark_df = spark.createDataFrame(df1)
spark_df.show()

# +-----+---+------+------+
# | name|age|weight|height|
# +-----+---+------+------+
# |  Bom| 20|  97.6|   165|
# |Alice| 23|  90.0|   160|
# +-----+---+------+------+

要是有直接轉(zhuǎn)換的方法,望投評(píng)論,感激不盡:)

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容