好久不見!我的產假還剩一個多月,小娃娃兩個半月了,天使寶寶,省爹省媽,所以我也有些時間了,陸續(xù)開始學習和備課,整理資料咯。
從Timer2.0的主頁發(fā)現了一個優(yōu)秀的資源:


http://timer.comp-genomics.org/infiltration_estimation_for_tcga.csv.gz
這里面是各種方法計算出的TCGA免疫浸潤結果。有現成的結果,做圖就完全是R語言技巧的輸出咯。
可以做接下來的幾件事:
1.將某一方法計算出來的免疫浸潤結果拆分成單獨的數據框。
2.繪制箱線圖,展示某種免疫細胞在所有癌癥中,tumor和normal的豐度比較。
3.繪制箱線圖,展示某種免疫細胞在所有癌癥中,tumor樣本的豐度,并按照中位數從小到大排好順序。
你可以先自己試試,再看我寫的答案。
1.拆分
rm(list = ls())
a = read.csv("infiltration_estimation_for_tcga.csv.gz",
row.names = 1,
check.names = F)
head(colnames(a))
## [1] "B cell_TIMER" "T cell CD4+_TIMER"
## [3] "T cell CD8+_TIMER" "Neutrophil_TIMER"
## [5] "Macrophage_TIMER" "Myeloid dendritic cell_TIMER"
library(stringr)
b = str_split(colnames(a),"_",simplify = T) %>% as.data.frame()
colnames(b) = c("cell","method")
table(b$method)
##
## CIBERSORT CIBERSORT-ABS EPIC MCPCOUNTER QUANTISEQ
## 22 22 8 11 11
## TIMER XCELL
## 6 39
k = b$method=="CIBERSORT";table(k)
## k
## FALSE TRUE
## 97 22
ciber = a[,k]
colnames(ciber) = str_remove(colnames(ciber),"_CIBERSORT")
ciber[1:4,1:4]
## B cell naive B cell memory B cell plasma T cell CD8+
## TCGA-OR-A5J1-01 0.002937309 0.002282572 0.00000000 0.11294112
## TCGA-OR-A5J2-01 0.046380291 0.000000000 0.15149520 0.07351904
## TCGA-OR-A5J3-01 0.061034875 0.000000000 0.19053762 0.01649453
## TCGA-OR-A5J5-01 0.253704390 0.000000000 0.03964469 0.03734717
2.泛癌的tumor和normal某種細胞豐度比較
需要泛癌臨床信息,才能知道每個樣本屬于哪種癌癥。可以從xena下載
#整理臨床信息,使其與免疫細胞豐度矩陣行名一致
m = read.delim("Survival_SupplementalTable_S1_20171025_xena_sp",row.names = 1)
y = intersect(rownames(ciber) , rownames(m));length(y)
## [1] 11031
ciber = ciber[y,]
m = m[y,]
identical(rownames(ciber),rownames(m))
## [1] TRUE
#做圖
library(ggplot2)
ciber$type = m$cancer.type.abbreviation
ciber$Group = ifelse(str_sub(rownames(ciber),14,15)<10,"tumor","normal")
ggplot(ciber,aes(x = type,y = `B cell naive`))+
geom_boxplot(aes(fill = Group),alpha = 0.7)+
scale_fill_manual(values = c("navy","firebrick3"))+
theme_bw()+
theme(axis.text.x = element_text(angle = 90,vjust = 0.5))

3.某種免疫細胞的豐度在所有tumor中的比較
隨便拿一個細胞B cell naive來畫。
library(dplyr)
dat = ciber[ciber$Group=="tumor",]
#利用因子,調整橫坐標順序
dat2 = group_by(dat,type) %>%
summarise(median = median(`B cell naive`)) %>%
arrange(median)
dat$type=factor(dat$type,levels = dat2$type)
library(ggplot2)
ggplot(dat,aes(type,`B cell naive`))+
geom_boxplot(outlier.size = 0.5)+
theme_bw()+
theme(axis.text.x = element_text(angle = 90,vjust = 0.5))
