本篇介紹Spark SQL如何連接JDBC數(shù)據(jù)庫(我以本地安裝的mysql為例)
Maven 中引入
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>6.0.5</version>
</dependency>
代碼示例
package com.yzy.spark;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
public class demo5 {
private static String appName = "spark.sql.demo";
private static String master = "local[*]";
public static void main(String[] args) {
SparkSession spark = SparkSession
.builder()
.appName(appName)
.master(master)
.getOrCreate();
Dataset<Row> df = spark.read()
.format("jdbc")
.option("url", "jdbc:mysql://localhost:3306/smvc")
.option("dbtable", "user")
.option("user", "root")
.option("pass" + "word", "123456")
.load();
df.printSchema();
}
}
運行結(jié)果:
root
|-- id: long (nullable = true)
|-- name: string (nullable = true)
|-- nicename: string (nullable = true)
|-- age: integer (nullable = true)
|-- pwd: string (nullable = true)
遇到的問題
1..option("pass" + "word", "123456")這樣寫是因為編譯器的sonar 檢查不過,password 是敏感字段。
2..format("jdbc"),如果忘記加上會報錯:ConnectedComponents: error 'Unable to infer schema for Parquet. It must be specified manually.意思是必須手動指定我要連接的是jdbc。
3..option("url", "jdbc:mysql://localhost:3306/smvc")會報錯:java.sql.SQLException:The server time zone value '?й???????' is unrecognized or represents more than onetime zone. You must configure either the server or JDBC driver (via theserverTimezone configuration property) to use a more specifc time zone value ifyou want to utilize time zone support. 解決辦法參考此文
value 修改為jdbc:mysql://localhost:3306/smvc?useUnicode=true&characterEncoding=UTF-8&serverTimezone=Asia/Shanghai 即可