在集合數(shù)少的時候韋恩圖是很好用的,但是當(dāng)集合數(shù)多比如 5 個以上的時候那就會看的眼花繚亂了。推薦用UpsetR進(jìn)行集合的繪圖。
1. R包的安裝及示例文件的加載
install.packages("UpSetR")#CRAN安裝
devtools::install_github("hms-dbmi/UpSetR") #Github的安裝路徑
library(UpSetR)
setwd("工作路徑") #按照自己工作路徑設(shè)置
require(ggplot2);
require(plyr);
require(gridExtra);
require(grid);
movies <- read.csv(system.file("extdata","movies.csv",package = "UpSetR"), header = TRUE, sep=";")
view(movies)#查看示例文件
這個R包里的事例文件如圖所示,第一列為電影名,第二列為上映時間,后面就是對電影的分類,比如動作片、喜劇片等等,在進(jìn)行繪圖前可以大致了解一下。
![查看事例文件.png]!(https://upload-images.jianshu.io/upload_images/28604302-a7f6f1f7fa188835.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)
2. upset函數(shù)的基本參數(shù)設(shè)置
upset(movies,
order.by = "freq", # 排序方式:freq:降序,degree:升序
nsets = 5, # 展示幾個集合,按照數(shù)量從大到小排列,
#sets=c("Drama","Comedy","Action","Thriller","Western","Documentary") #使用sets參數(shù)指定集合名字
nintersects = 30,#展示交集數(shù)
mb.ratio = c(0.55,0.45), # 條形圖和矩陣的相對比例
number.angles = 30, # 條形圖上面數(shù)字角度
point.size = 3, # 點(diǎn)的大小
line.size = 1.2, # 線條粗細(xì)
mainbar.y.label = "size of intersection", # 上面條形圖的標(biāo)題
sets.x.label = "the number of each sets", # 坐標(biāo)條形圖的標(biāo)題
text.scale = c(1.2, 1.3, 1, 1, 2, 1.2), # 元素大小
matrix.color = "firebrick",#點(diǎn)陣的顏色
main.bar.color = "steelblue",#柱狀圖的顏色
sets.bar.color = "grey70"#圖例的顏色
)

簡單繪制出來的圖形就如上圖所示,
1). 在矩陣圖中紅色點(diǎn)表示該區(qū)域是有數(shù)據(jù);灰色的點(diǎn)表示該區(qū)域沒有數(shù)據(jù);紅色連線表示數(shù)據(jù)間存在交集;
2).上方藍(lán)色區(qū)域的柱狀圖表示相應(yīng)的數(shù)據(jù)值;
3).左邊的Set size 條形圖表示此次繪圖用到的數(shù)據(jù)類型;
3. 接下來就是該包的高階用法——queries
主要幾個參數(shù):
query——指定內(nèi)容(如查找交集、元素等)
params——是查詢要處理的參數(shù)列表
color——將在plot上表示查詢顏色,如果沒有提供顏色,將從UpSetR默認(rèn)調(diào)色板中選擇一種顏色
active——為TURE時候,交叉大小條將被查詢的條覆蓋;為FALSE時候,則不覆蓋。
Example1. 突出顯示交集
upset(movies,
queries = list(
list(query = intersects, #指定尋找交集
params = list("Drama", "Comedy", "Action"), #選擇"Drama", "Comedy", "Action"(的交集)
color = "orange",#表現(xiàn)為橙色
active = T), #在柱狀圖上顯示
list(query = intersects,
params = list("Drama"), #找"Drama”數(shù)據(jù)的交集——即突出顯示單組數(shù)據(jù)
color = "red", #紅色顯示
active = F), #取消柱狀圖顯示,在矩陣中仍能找到該突出點(diǎn)
list(query = intersects,
params = list("Action", "Drama"), #找"Action", "Drama"的交集
active = T)))#由于沒有設(shè)置顏色,默認(rèn)從UpSetR的調(diào)色板中選擇顏色

Example2: 尋找特定元素內(nèi)容
upset(movies,
queries = list(
list(query = elements,#在數(shù)據(jù)中尋找相應(yīng)元素
params = list("AvgRating", 3.5, 4.1),#對元素進(jìn)行相關(guān)限定
color = "blue",
active = T),
list(query = elements,
params = list("ReleaseDate", 1980, 1990, 2000),
color = "red", active = F)))

Example 3: 使用expression參數(shù)獲得元素查詢和交集查詢的子集
upset(movies, queries = list(
list(
query = intersects,
params = list("Action","Drama"),
active = T),
list(
query = elements,
params = list("ReleaseDate", 1980, 1990, 2000),
color = "red",
active = F)),
expression = "AvgRating > 3 & Watches > 100")#同時滿足【評分】大于3且觀【看人數(shù)】大于100的子集
這個地方有點(diǎn)難理解,對比看無expression參數(shù)和添加該參數(shù)的結(jié)果圖就很容易明白


Example 4: 自定義查詢相關(guān)元素
根據(jù)自己的需求,設(shè)置相關(guān)函數(shù)定義,下面舉兩個例子展開解釋:
Myfunc <- function(row, release, rating) {
data <- (row["ReleaseDate"] %in% release) & (row["AvgRating"] > rating)
}
# 引入三個關(guān)鍵參數(shù) row、release、rating
#【發(fā)行日期】符合release且【評分等級】大于rating的列
# 因此新函數(shù)需用release和rating兩個參數(shù)————對應(yīng)后面的c(1970, 1980, 1990, 1999, 2000)和2.5
upset(movies,
queries = list(
list(
query = Myfunc,
params = list(c(1970, 1980, 1990, 1999, 2000), 2.5),
color = "blue",
active = T)))

下面這個事例就類似,可以參照上面的理解
between <- function(row, min, max){
newData <- (row["ReleaseDate"] < max) & (row["ReleaseDate"] > min)
} #最小值至最大值之間的列賦值給新數(shù)據(jù)
upset(movies,
sets=c("Drama","Comedy","Action","Thriller","Western","Documentary"),
queries = list(
list(
query = intersects,
params = list("Drama", "Thriller")),
list(query = between,
params=list(1970,1980),
color="red",
active=TRUE)))

Example 5: 引入圖例
僅在圖上突出顯示不夠清晰,因此引入圖例就格外重要:
upset(movies,
query.legend = "top", #圖例位置
queries = list(
list(query = intersects,
params = list("Drama", "Comedy", "Action"),
color = "orange",
active = T,
query.name = "Funny action"),#圖例名稱
list(query = intersects,
params = list("Action","Drama"),
active=T,
query.name="Emotional action"),#圖例名稱
list(query = intersects,
params = list("Drama"),
color="red",
active=F)))#未添加圖例,會按照順序默認(rèn)添加

Example 6: 同時滿足多種需求
upset(movies, query.legend = "bottom",
queries = list(
list(query = Myfunc, #按自己需求設(shè)置函數(shù)
params = list(c(1970,1980, 1990, 1999, 2000), 2.5),
color = "orange",
active = T),
list(query = intersects, #獲取交集
params = list("Action", "Drama"),
active = T),
list(query = elements, #突出顯示指定元素
params = list("ReleaseDate", 1980, 1990, 2000),
color = "red",
active = F,
query.name = "Decades")),
expression = "AvgRating > 3 & Watches > 100")

7.在Upset圖下方聚合指定數(shù)據(jù)的分布情況
upset(movies,
attribute.plots=list(
gridrows=60, #upset圖下面留的間距
plots=list(
list(plot=scatter_plot, #點(diǎn)狀圖
x="ReleaseDate",
y="AvgRating"),
list(plot=scatter_plot,
x="ReleaseDate",
y="Watches"),
list(plot=scatter_plot,
x="Watches",
y="AvgRating"),
list(plot=histogram, #柱狀圖
x="ReleaseDate")),
ncols = 2))
