項(xiàng)目概況
[????????????????]
點(diǎn)這里,查看所有項(xiàng)目
[????????????????]
數(shù)據(jù)類(lèi)型
旅游景點(diǎn)數(shù)據(jù)
開(kāi)發(fā)環(huán)境
centos7
軟件版本
python3.8.18、hadoop3.2.0、hive3.1.2、mysql5.7.38、jdk8、scala2.12.18、kafka2.8.2、sqoop1.4.7
開(kāi)發(fā)語(yǔ)言
python、Scala、java
開(kāi)發(fā)流程
數(shù)據(jù)清洗(python)->數(shù)據(jù)上傳(hdfs)->數(shù)據(jù)分析(hive)->寫(xiě)kafka(python)->實(shí)時(shí)計(jì)算(spark)->數(shù)據(jù)存儲(chǔ)(mysql)->后端(springboot)->前端(vue)
可視化圖表

2025-06-23_210351.jpg

2025-06-23_210358.jpg

2025-06-23_210408.jpg

2025-06-23_210418.jpg

2025-06-23_210423.jpg

2025-06-23_210427.jpg

2025-06-23_210433.jpg

2025-06-23_210438.jpg

2025-06-23_210443.jpg
操作步驟
python安裝包
pip3 install pandas==2.0.3 -i https://mirrors.aliyun.com/pypi/simple/
pip3 install flask==3.0.0 -i https://mirrors.aliyun.com/pypi/simple/
pip3 install flask-cors==4.0.1 -i https://mirrors.aliyun.com/pypi/simple/
pip3 install pymysql==1.1.0 -i https://mirrors.aliyun.com/pypi/simple/
pip3 install jieba==0.42.1 -i https://mirrors.aliyun.com/pypi/simple/
pip3 install pyecharts==2.0.4 -i https://mirrors.aliyun.com/pypi/simple/
pip3 install openpyxl==3.1.3 -i https://mirrors.aliyun.com/pypi/simple/
啟動(dòng)MySQL
# 查看mysql是否啟動(dòng) 啟動(dòng)命令: systemctl start mysqld.service
systemctl status mysqld.service
# 進(jìn)入mysql終端
# MySQL的用戶名:root 密碼:123456
# MySQL的用戶名:root 密碼:123456
# MySQL的用戶名:root 密碼:123456
mysql -uroot -p123456
啟動(dòng)Hadoop
# 離開(kāi)安全模式: hdfs dfsadmin -safemode leave
# 啟動(dòng)hadoop
bash /export/software/hadoop-3.2.0/sbin/start-hadoop.sh
啟動(dòng)hive
# 在第一個(gè)窗口中,執(zhí)行后等待10-20秒
/export/software/apache-hive-3.1.2-bin/bin/hive --service metastore
# 在第二個(gè)窗口中,執(zhí)行后等待10-20秒
/export/software/apache-hive-3.1.2-bin/bin/hive --service hiveserver2
# 連接進(jìn)入hive終端命令如下:
# /export/software/apache-hive-3.1.2-bin/bin/beeline -u jdbc:hive2://master:10000 -n root
啟動(dòng)kafka
# 啟動(dòng)zookeeper
sh /export/software/kafka_2.12-2.8.2/bin/zookeeper-server-start.sh -daemon /export/software/kafka_2.12-2.8.2/config/zookeeper.properties
# 啟動(dòng)kafka
sh /export/software/kafka_2.12-2.8.2/bin/kafka-server-start.sh -daemon /export/software/kafka_2.12-2.8.2/config/server.properties
# 創(chuàng)建topic
/export/software/kafka_2.12-2.8.2/bin/kafka-topics.sh --create --topic agg_ticket --replication-factor 1 --partitions 1 --zookeeper master:2181
# 啟動(dòng)消費(fèi)者
/export/software/kafka_2.12-2.8.2/bin/kafka-console-consumer.sh --bootstrap-server master:9092 --topic agg_ticket
# 關(guān)閉kafka
# sh /export/software/kafka_2.12-2.8.2/bin/kafka-server-stop.sh
# 關(guān)閉zookeeper
# sh /export/software/kafka_2.12-2.8.2/bin/zookeeper-server-stop.sh
數(shù)據(jù)清洗
mkdir -p /data/jobs/project/data/
cd /data/jobs/project/data/
# 上傳 "data" 目錄下的 "data.xlsx" 文件
# 上傳 "data" 目錄下的 "stopwords.txt" 文件
cd /data/jobs/project/
# 上傳 "data_clean.py" 文件
python3 data_clean.py
上傳文件到hdfs
cd /data/jobs/project/
ls -l output/
hdfs dfs -mkdir -p /data/origin/tourist_info/
hdfs dfs -mkdir -p /data/origin/tourist_word/
hdfs dfs -put -f output/tourist /data/origin/tourist_info/
hdfs dfs -put -f output/tourist_word.csv /data/origin/tourist_word/
hdfs dfs -ls /data/origin/tourist_info/
hdfs dfs -ls /data/origin/tourist_word/
hive數(shù)據(jù)分析
cd /data/jobs/project/
# 上傳 "hive" 目錄下的 "hive.sql" 文件
# 連接進(jìn)入hive終端命令如下:
# /export/software/apache-hive-3.1.2-bin/bin/beeline -u jdbc:hive2://master:10000 -n root
# 快速執(zhí)行hive.sql
hive -v -f hive.sql
創(chuàng)建MySQL表
cd /data/jobs/project/
# 上傳 "mysql" 目錄下的 "mysql.sql" 文件
# 請(qǐng)確認(rèn)mysql服務(wù)已經(jīng)啟動(dòng)了
# 快速執(zhí)行.sql文件內(nèi)的sql語(yǔ)句
mysql -u root -p < mysql.sql
數(shù)據(jù)導(dǎo)入MySQL
cd /data/jobs/project/
# 上傳 "sqoop.sh" 文件
sed -i 's/\r//g' sqoop.sh
bash sqoop.sh
讀取csv寫(xiě)入kafka
cd /data/jobs/project/
# 上傳 "實(shí)時(shí)" 目錄下的 "create_and_send_msg.py" 文件
python3 create_and_send_msg.py
spark實(shí)時(shí)計(jì)算
cd /data/jobs/project/
# 打包 "spark-job" 項(xiàng)目
# 打包命令: mvn clean package -Dmaven.test.skip=true
# 上傳 "spark-job/target/" 目錄下的 "spark-job-jar-with-dependencies.jar" 文件
spark-submit \
--master local[*] \
--class com.exam.SparkAppStream \
/data/jobs/project/spark-job-jar-with-dependencies.jar
啟動(dòng)可視化
# 安裝node
# 啟動(dòng)springboot
# 啟動(dòng)前端
npm install --registry=https://registry.npmmirror.com
npm run dev
# 清除 npm 緩存并刪除 node_modules
npm cache clean --force
rm -rf node_modules
rm package-lock.json
npm install