實(shí)驗(yàn)內(nèi)容
1.在TCGA數(shù)據(jù)庫(kù)下載疾病BRCA的突變數(shù)據(jù)
熟悉TCGA數(shù)據(jù)庫(kù)的結(jié)構(gòu)與數(shù)據(jù)下載方法
2.處理下載數(shù)據(jù),將其變?yōu)榱袨闃颖拘袨榛虻男问讲⑤敵?/strong>(數(shù)據(jù)處理過(guò)程中去掉同義突變),用0表示沒(méi)有突變,1表示突變

image.png
實(shí)驗(yàn)代碼
setwd("E:\\實(shí)驗(yàn)\\轉(zhuǎn)錄組學(xué)\\實(shí)驗(yàn)三")
exp<-read.csv("exp.csv",as.is=T)
geneid<-unique(exp[,2])#提取geneid列18062
sample<-unique(exp[,4])#提取sample列986
long1<-length(geneid)
long2<-length(sample)
genelist1<-as.data.frame(list())#建立一個(gè)空數(shù)據(jù)框
install.packages("plyr")
library(plyr)
#找表達(dá)譜里每個(gè)樣本出現(xiàn)的gene,形成數(shù)據(jù)框
for (i in 1:long2)
{
genelist<-as.data.frame(t(exp[which(exp[,4]==sample[i]),2]))
genelist1<-rbind.fill(genelist,genelist1)
}
dim(genelist1)
genelist2<-t(genelist1)
dim(genelist2)

image.png
var_exp<-matrix(NA,long1,long2)#建立一個(gè)空矩陣,18062*986
#判斷每個(gè)樣本對(duì)應(yīng)的每個(gè)gene是否在全部gene里(在為T(mén)RUE,不在為FLASE)
for (i in 1:long2)
var_exp[,i]<-geneid%in%genelist2[,i]

image.png
varexp<-as.matrix(lapply(var_exp,as.numeric))#轉(zhuǎn)化成0/1
varexp1<-matrix(varexp,18062,986)#轉(zhuǎn)化為矩陣
rownames(varexp1)<-geneid
colnames(varexp1)<-sample
write.table(varexp1,"varexp.txt",sep="\t",header=T)#寫(xiě)出
輸出varexp

image.png