通過(guò)列表創(chuàng)建
# -*- coding: utf-8 -*-
from pyspark.sql import SparkSession
spark = SparkSession\
.builder\
.master("local")\
.appName("create df")\
.getOrCreate()
# 列表里是列表
list1 = [["Bom", 20, 97.6, 165],
["Alice", 23, 90.0, 160]]
df1 = spark.createDataFrame(list1, ["name", "age", "weight", "height"])
df1.show()
# +-----+---+------+------+
# | name|age|weight|height|
# +-----+---+------+------+
# | Bom| 20| 97.6| 165|
# |Alice| 23| 90.0| 160|
# +-----+---+------+------+
# 列表里是元組
list2 = [("Bom", 20, 97.6, 165),
("Alice", 23, 90.0, 160)]
df2 = spark.createDataFrame(list2, ["name", "age", "weight", "height"])
df2.show()
# +-----+---+------+------+
# | name|age|weight|height|
# +-----+---+------+------+
# | Bom| 20| 97.6| 165|
# |Alice| 23| 90.0| 160|
# +-----+---+------+------+
df2_no_header = spark.createDataFrame(list2)
df2_no_header.show()
# +-----+---+----+---+
# | _1| _2| _3| _4|
# +-----+---+----+---+
# | Bom| 20|97.6|165|
# |Alice| 23|90.0|160|
# +-----+---+----+---+
# 列表里是字典
list3 = [{"name": "Bom", "age": 20, "weight": 97.6, "height": 165},
{"name": "Alice", "age": 23, "weight": 90.0, "height": 160}]
df3 = spark.createDataFrame(list3)
df3.show()
# +---+------+-----+------+
# |age|height| name|weight|
# +---+------+-----+------+
# | 20| 165| Bom| 97.6|
# | 23| 160|Alice| 90.0|
# +---+------+-----+------+
通過(guò)列表創(chuàng)建dataframe,列表里面可以是列表也可以是元組。
從json文件創(chuàng)建
json文件people.json:
{"name":"Michael"}
{"name":"Andy", "age":30}
{"name":"Justin", "age":19}
spark代碼:
# -*- coding: utf-8 -*-
from pyspark.sql import SparkSession
spark = SparkSession\
.builder\
.master("local")\
.appName("create df from json")\
.getOrCreate()
df = spark.read.json("file:///Users/zhi/Documents/pycharm/spark_project/spark_test/people.json")
df.show()
# +----+-------+
# | age| name|
# +----+-------+
# |null|Michael|
# | 30| Andy|
# | 19| Justin|
# +----+-------+
從字典創(chuàng)建
目前還沒(méi)有想到直接從字典轉(zhuǎn)的df的方式,現(xiàn)在只能借用pandas轉(zhuǎn)成df,再轉(zhuǎn)成spark的df格式:
# -*- coding: utf-8 -*-
from pyspark.sql import SparkSession
import pandas as pd
spark = SparkSession\
.builder\
.master("local")\
.appName("create df")\
.getOrCreate()
dict1 = {"name": ["Bom", "Alice"],
"age": [20, 23],
"weight": [97.6, 90.0],
"height": [165, 160]}
df1 = pd.DataFrame(dict1)
spark_df = spark.createDataFrame(df1)
spark_df.show()
# +-----+---+------+------+
# | name|age|weight|height|
# +-----+---+------+------+
# | Bom| 20| 97.6| 165|
# |Alice| 23| 90.0| 160|
# +-----+---+------+------+
要是有直接轉(zhuǎn)換的方法,望投評(píng)論,感激不盡:)