使用UpsetR對多組數(shù)據(jù)進(jìn)行交集分析

在集合數(shù)少的時候韋恩圖是很好用的,但是當(dāng)集合數(shù)多比如 5 個以上的時候那就會看的眼花繚亂了。推薦用UpsetR進(jìn)行集合的繪圖。

1. R包的安裝及示例文件的加載

install.packages("UpSetR")#CRAN安裝
devtools::install_github("hms-dbmi/UpSetR") #Github的安裝路徑

library(UpSetR)
setwd("工作路徑") #按照自己工作路徑設(shè)置

require(ggplot2); 
require(plyr); 
require(gridExtra); 
require(grid);
movies <- read.csv(system.file("extdata","movies.csv",package = "UpSetR"), header = TRUE, sep=";")
view(movies)#查看示例文件

這個R包里的事例文件如圖所示,第一列為電影名,第二列為上映時間,后面就是對電影的分類,比如動作片、喜劇片等等,在進(jìn)行繪圖前可以大致了解一下。
![查看事例文件.png]!(https://upload-images.jianshu.io/upload_images/28604302-a7f6f1f7fa188835.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

2. upset函數(shù)的基本參數(shù)設(shè)置

upset(movies,
      order.by = "freq", # 排序方式:freq:降序,degree:升序
      nsets = 5, # 展示幾個集合,按照數(shù)量從大到小排列,
      #sets=c("Drama","Comedy","Action","Thriller","Western","Documentary") #使用sets參數(shù)指定集合名字
      nintersects = 30,#展示交集數(shù)
      mb.ratio = c(0.55,0.45), # 條形圖和矩陣的相對比例
      number.angles = 30, # 條形圖上面數(shù)字角度
      point.size = 3, # 點(diǎn)的大小
      line.size = 1.2, # 線條粗細(xì)
      mainbar.y.label = "size of intersection", # 上面條形圖的標(biāo)題
      sets.x.label = "the number of each sets", # 坐標(biāo)條形圖的標(biāo)題
      text.scale = c(1.2, 1.3, 1, 1, 2, 1.2), # 元素大小
      matrix.color = "firebrick",#點(diǎn)陣的顏色
      main.bar.color = "steelblue",#柱狀圖的顏色
      sets.bar.color = "grey70"#圖例的顏色
      )
upset基本.png

簡單繪制出來的圖形就如上圖所示,
1). 在矩陣圖中紅色點(diǎn)表示該區(qū)域是有數(shù)據(jù);灰色的點(diǎn)表示該區(qū)域沒有數(shù)據(jù);紅色連線表示數(shù)據(jù)間存在交集;
2).上方藍(lán)色區(qū)域的柱狀圖表示相應(yīng)的數(shù)據(jù)值;
3).左邊的Set size 條形圖表示此次繪圖用到的數(shù)據(jù)類型;

3. 接下來就是該包的高階用法——queries

主要幾個參數(shù):

query——指定內(nèi)容(如查找交集、元素等)
params——是查詢要處理的參數(shù)列表
color——將在plot上表示查詢顏色,如果沒有提供顏色,將從UpSetR默認(rèn)調(diào)色板中選擇一種顏色
active——為TURE時候,交叉大小條將被查詢的條覆蓋;為FALSE時候,則不覆蓋。

Example1. 突出顯示交集

upset(movies,
      queries = list(
         list(query = intersects, #指定尋找交集
              params = list("Drama", "Comedy", "Action"), #選擇"Drama", "Comedy", "Action"(的交集)
              color = "orange",#表現(xiàn)為橙色 
              active = T), #在柱狀圖上顯示
         list(query = intersects,
              params = list("Drama"), #找"Drama”數(shù)據(jù)的交集——即突出顯示單組數(shù)據(jù)
              color = "red", #紅色顯示
              active = F), #取消柱狀圖顯示,在矩陣中仍能找到該突出點(diǎn)
         list(query = intersects, 
              params = list("Action", "Drama"), #找"Action", "Drama"的交集
              active = T)))#由于沒有設(shè)置顏色,默認(rèn)從UpSetR的調(diào)色板中選擇顏色
突出顯示交集.png

Example2: 尋找特定元素內(nèi)容

upset(movies, 
      queries = list(
         list(query = elements,#在數(shù)據(jù)中尋找相應(yīng)元素
              params = list("AvgRating", 3.5, 4.1),#對元素進(jìn)行相關(guān)限定
              color = "blue", 
              active = T), 
         list(query = elements,
              params = list("ReleaseDate", 1980, 1990, 2000),
              color = "red", active = F)))
尋找特定元素內(nèi)容.png

Example 3: 使用expression參數(shù)獲得元素查詢和交集查詢的子集

upset(movies, queries = list(
   list(
      query = intersects, 
      params = list("Action","Drama"),
      active = T), 
   list(
      query = elements, 
      params = list("ReleaseDate", 1980, 1990, 2000),
      color = "red", 
      active = F)),
   expression = "AvgRating > 3 & Watches > 100")#同時滿足【評分】大于3且觀【看人數(shù)】大于100的子集

這個地方有點(diǎn)難理解,對比看無expression參數(shù)和添加該參數(shù)的結(jié)果圖就很容易明白

無expression結(jié)果圖.png
引入expression結(jié)果圖.png

Example 4: 自定義查詢相關(guān)元素

根據(jù)自己的需求,設(shè)置相關(guān)函數(shù)定義,下面舉兩個例子展開解釋:

Myfunc <- function(row, release, rating) {
   data <- (row["ReleaseDate"] %in% release) & (row["AvgRating"] > rating)
} 
#  引入三個關(guān)鍵參數(shù) row、release、rating
#【發(fā)行日期】符合release且【評分等級】大于rating的列
#  因此新函數(shù)需用release和rating兩個參數(shù)————對應(yīng)后面的c(1970, 1980, 1990, 1999, 2000)和2.5


upset(movies,
      queries = list(
         list(
            query = Myfunc,
            params = list(c(1970, 1980, 1990, 1999, 2000), 2.5),
            color = "blue", 
            active = T)))
自定義查找相關(guān)元素1

下面這個事例就類似,可以參照上面的理解

between <- function(row, min, max){
   newData <- (row["ReleaseDate"] < max) & (row["ReleaseDate"] > min)
}   #最小值至最大值之間的列賦值給新數(shù)據(jù)

upset(movies,
      sets=c("Drama","Comedy","Action","Thriller","Western","Documentary"),
      queries = list(
         list(
            query = intersects,
            params = list("Drama", "Thriller")),
         list(query = between, 
              params=list(1970,1980),
              color="red", 
              active=TRUE)))
自定義查找相關(guān)元素2.png

Example 5: 引入圖例

僅在圖上突出顯示不夠清晰,因此引入圖例就格外重要:

upset(movies,
      query.legend = "top", #圖例位置
      queries = list(
         list(query = intersects, 
              params = list("Drama", "Comedy", "Action"), 
              color = "orange",
              active = T, 
              query.name = "Funny action"),#圖例名稱
         list(query = intersects,
              params = list("Action","Drama"), 
              active=T,
              query.name="Emotional action"),#圖例名稱
         list(query = intersects,
              params = list("Drama"), 
              color="red",
              active=F)))#未添加圖例,會按照順序默認(rèn)添加
引入圖例.png

Example 6: 同時滿足多種需求

upset(movies, query.legend = "bottom",
      queries = list(
         list(query = Myfunc, #按自己需求設(shè)置函數(shù)
              params = list(c(1970,1980, 1990, 1999, 2000), 2.5), 
              color = "orange", 
              active = T), 
         list(query = intersects, #獲取交集
              params = list("Action", "Drama"),
              active = T), 
         list(query = elements, #突出顯示指定元素
              params = list("ReleaseDate", 1980, 1990, 2000),
              color = "red", 
              active = F, 
              query.name = "Decades")), 
      expression = "AvgRating > 3 & Watches > 100")
同時滿足多種需求.png

7.在Upset圖下方聚合指定數(shù)據(jù)的分布情況

upset(movies,
      attribute.plots=list(
         gridrows=60, #upset圖下面留的間距
         plots=list(
            list(plot=scatter_plot, #點(diǎn)狀圖
                 x="ReleaseDate", 
                 y="AvgRating"),
            list(plot=scatter_plot,
                 x="ReleaseDate",
                 y="Watches"),
            list(plot=scatter_plot,
                 x="Watches", 
                 y="AvgRating"),
            list(plot=histogram, #柱狀圖
                 x="ReleaseDate")),
         ncols = 2))
多種圖形匯總
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

友情鏈接更多精彩內(nèi)容