Q:ArrayExpress包常用函數(shù)及分析?

GEO數(shù)據(jù)庫(kù)對(duì)應(yīng)的R包是GEOquery,強(qiáng)大的ArrayExpress數(shù)據(jù)庫(kù)也有對(duì)應(yīng)的R包——ArrayExpress。
可以通過(guò)其訪問(wèn)ArrayExpress數(shù)據(jù)庫(kù),并構(gòu)建ExpressonSet、AffyBatch和NChannelSet數(shù)據(jù)結(jié)構(gòu)。

核心函數(shù):

  • getAE(accession, type="full")

  • ae2bioc(rawset)

對(duì)于處理過(guò)的數(shù)據(jù):
cnames <- getcolproc(mexp1422)
mexp1422proc <- procset(mexp1422,cnames[2])

1.數(shù)據(jù)下載

rm(list = ls())
library(ArrayExpress)
library(affy)
acs="E-MEXP-1422"
mexp1422 <- getAE(acs,type = "full", local =T ) 
#type = "raw" 下載raw data
#type = "processed" 下載processed data
#type = "full" 兩者都下載
# local = T 當(dāng)前目錄下有,不用再次下載

mexp1422raw <- ae2bioc(mexp1422) #從raw data構(gòu)建ExpressionSet對(duì)象
## Reading in : E:/panCancer/27-GIlnc/AF15.CEL
## Reading in : E:/panCancer/27-GIlnc/AF16.CEL
## Reading in : E:/panCancer/27-GIlnc/AF6.CEL
## Reading in : E:/panCancer/27-GIlnc/AF14.CEL
## Reading in : E:/panCancer/27-GIlnc/AF7.CEL
## Reading in : E:/panCancer/27-GIlnc/AF8.CEL
cnames <- getcolproc(mexp1422)
mexp1422proc <- procset(mexp1422,cnames[2]) #從processed data構(gòu)建ExpressionSet對(duì)象
## Reading processed data matrix file,  E:/panCancer/27-GIlnc/expression_and_calls.txt.magetab

2.數(shù)據(jù)預(yù)處理

mexp1422norm <- rma(mexp1422raw) #背景校正歸一化
## Background correcting
## Normalizing
## Calculating Expression
boxplot(mexp1422norm)
image.png

可以看出校正后效果挺好

mexp1422norm1 <- exprs(mexp1422proc)
boxplot(mexp1422norm1) #可以看出processed data是歸一化過(guò)的數(shù)據(jù)
image.png

3.差異分析

pd <- pData(mexp1422raw)
pd <- pd %>% 
  select("Factor.Value..RNAi.")
colnames(pd) <- "group"
pd$group <- case_when(pd$group=="GFP siRNA"~"ctrl",
                      pd$group=="PROX1 siRNA #1"~"siRNA1",
                      pd$group=="PROX1 siRNA #2"~"siRNA2")
group_list <- factor(pd$group,levels = c("ctrl","siRNA1","siRNA2"))
names(group_list) <- rownames(group_list)
design <- model.matrix(~0+group_list)
colnames(design) <- levels(group_list)
library(limma)
cont.matrix <- makeContrasts(si1.normal="siRNA1-ctrl",
                             si2.normal="siRNA2-ctrl",
                             si2.si1="siRNA2-siRNA1",
                             levels = design)
fit <- lmFit(mexp1422norm,design)
fit2 <- contrasts.fit(fit,cont.matrix)
fit2 <- eBayes(fit2)
deg <- topTable(fit2,adjust.method = "BH",sort.by = "F",n=Inf) %>% 
  rownames_to_column("probe_id")

# 探針注釋
library(AnnoProbe)
probe2id <- idmap("GPL570")

volcanoInput <- probe2id %>% 
  inner_join(deg,by = "probe_id") %>% 
  select(-probe_id)

4. 火山圖

volcanoInput$type <- case_when(volcanoInput$adj.P.Val<0.05&volcanoInput$si2.normal>1 ~ "up",
                               volcanoInput$adj.P.Val<0.05&volcanoInput$si2.normal< -1 ~ "down",
                               T~"stable")
ggplot(volcanoInput,aes(x=si2.normal,y=-log10(adj.P.Val),color=type))+
  geom_point(alpha=0.4,size=3.5)+
  scale_color_manual(values=c("#546de5", "#d2dae2","#ff4757"))+
  geom_vline(xintercept=c(-1,1),lty=4,col="black",lwd=0.8) +
  geom_hline(yintercept = -log10(0.05),lty=4,col="black",lwd=0.8) +
  labs(x="log2(fold change)",
       y="-log10 (p-value)")+
  theme_bw()+
  theme(plot.title = element_text(hjust = 0.5), 
        legend.position="right", 
        legend.title = element_blank())
image.png

參考鏈接:

ArrayExpress

R語(yǔ)言limma包差異基因分析(兩組或兩組以上)

R繪圖:ggplot2繪制火山圖

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容