數(shù)據(jù)可視化總是看著簡單,但實(shí)操起來bug頻繁,究其原因,還是數(shù)據(jù)處理和ggplot2的相關(guān)參數(shù)等掌握不精。
1. 數(shù)據(jù)準(zhǔn)備
#加載包
library(dplyr)
library(readr)
library(ggplot2)
#讀取數(shù)據(jù)
athlete_events <- read_csv("athlete_events.csv")
noc_regions <- read_csv("noc_regions.csv")
head(athlete_events)
athletedata <- inner_join(athlete_events,noc_regions,c("NOC"))
head(athletedata)
2. 了解每個地區(qū)參數(shù)人數(shù)的分布情況(條形圖)
region <- athletedata %>%
group_by(region) %>%
summarise(value=n()) %>%
arrange(desc(value)) #按照地區(qū)分組,計算每個地區(qū)的參賽人數(shù),并按照從大到小的順序輸出
region30 <- region[1:30,]
region_plot <- ggplot(region30,aes(x=reorder(region,value),y=value)) +
theme_bw(base_family = "STKaiti") +
geom_bar(aes(fill=value),stat = "identity",show.legend = F) +
coord_flip() +
scale_fill_gradient(low = "blue",high = "red") +
labs(x="地區(qū)",y="參賽人數(shù)",title = "每個地區(qū)參賽人數(shù)的分布情況") +
theme(axis.text.x = element_text(vjust = 0.5),plot.title = element_text(hjust = 0.5))
region_plot

條形圖
3. 了解不同時間、不同地區(qū)、不同性別的參賽人數(shù)的分布情況(熱力圖)
index <- region30$region[1:30] #取前30個地區(qū)為一組向量,而region30[1:30,1]仍為列表形式
region30_merge
<- athletedata %>%
filter(region %in% index) %>%
group_by(Year,region,Sex) %>%
summarise(value=n())
merge_plot <- ggplot(region30_merge,aes(x=Year,y=region)) +
theme_bw(base_family = "STKaiti") +
geom_tile(aes(fill=value),color="white") +
scale_fill_gradientn(colors = c("blue","red")) +
scale_x_continuous(breaks = unique(region30_merge$Year)) +
theme(axis.text.x = element_text(hjust = 0.5,angle = 90)) +
facet_wrap(~Sex,nrow = 2)
options(repr.plot.width=10, repr.plot.height=8)
merge_plot

熱力圖
4. 將USA、Germany、France、UK、Russia、China這6個地區(qū)每年奧運(yùn)會獎牌的數(shù)量可視化(折線圖)
index <- c("USA","Germany","France","UK","Russia","China")
region6 <- athletedata %>%
filter(region %in% index) %>% #只選擇6個地區(qū)
filter(Medal!="NA") %>% #只選擇有獎牌獲得
group_by(region,Year) %>%
summarise(value=n())
region6_plot <- ggplot(region6,aes(x=Year,y=value)) +
theme_bw(base_family = "STKaiti") +
geom_line() +
facet_wrap(~region,nrow = 3)
region6_plot

折線圖
5. 動態(tài)展示不同地區(qū)每年的獎牌獲得情況
library(gganimate)
index <- region30$region[1:30]
region30_medal <- athletedata %>%
filter(region %in% index) %>%
filter(Medal!="NA") %>%
group_by(region,Year) %>%
summarise(value=n())
region30_medal$Year <- as.integer(region30_medal$Year) #將year這一變量變?yōu)檎妥兞?
region30_plot <- ggplot(region30_medal,aes(x=region,y = value)) +
theme_bw() +
geom_bar(stat = "identity",show.legend = F) +
theme(axis.text.x = element_text(hjust = 0.5,angle = 90)) +
transition_time(Year) +
labs(title="Year: {frame_time}")
region30_plot
6. 對地區(qū)、運(yùn)動員數(shù)量、性別、獎牌數(shù)量這些變量進(jìn)行可視化分析(樹圖)
library(treemap)
#計算獎牌數(shù)量
medal <- athletedata %>%
filter(Medal!="NA") %>%
group_by(region,Sex) %>%
summarise(medalnum=n())
#計算運(yùn)動員數(shù)量
athelete <- athletedata %>%
group_by(region,Sex) %>%
summarise(atheletenum=n())
data <- inner_join(medal,athelete,c("region","Sex"))
data_plot <- treemap(data,index = c("Sex","region"),vSize = "atheletenum",
vColor="medalnum",type="value",palette=c("blue","red"),
title="不同性別下不同地區(qū)運(yùn)動員的數(shù)量",title.legend="獎牌數(shù)量",
fontfamily.title="STKaiti",fontfamily.legend="STKaiti")
data_plot

樹圖
日常廢話:很多參數(shù)的設(shè)置都可以理解為圖層的疊加,因此只要心中有大致的圖形輪廓,圍繞這個輪廓進(jìn)行添枝加葉,就能更加靈活應(yīng)用各種參數(shù)??赡苊總€新手都會同我一樣,面對這些措手不及,甚至產(chǎn)生畏懼心理,覺得自己可能就不是學(xué)代碼的這塊料,自己就是不行,學(xué)啥都慢,等等。有這些想法都是正常的,因為很多知識都是我們需要用了才會去了解去掌握,為了進(jìn)步為了更加優(yōu)秀才會深入拓展,所以一定要給自己運(yùn)用這些知識的機(jī)會,在實(shí)踐中你才能享受到解決問題的快樂?。?!