作業(yè)3
找到R包CLL內(nèi)置的數(shù)據(jù)集的表達(dá)矩陣?yán)锩娴腡P53基因的表達(dá)量,并且繪制在 progres.-stable分組的boxplot圖
答案如下:
> rm(list = ls())
> options(stringsAsFactors = F)
> suppressPackageStartupMessages(library(CLL))
> data(sCLLex)
> sCLLex
ExpressionSet (storageMode: lockedEnvironment)
assayData: 12625 features, 22 samples
element names: exprs
protocolData: none
phenoData
sampleNames: CLL11.CEL CLL12.CEL ... CLL9.CEL (22 total)
varLabels: SampleID Disease
varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: hgu95av2
> exprSet <- exprs(sCLLex)###獲得其表達(dá)矩陣

image.png
> pd <- pData(sCLLex)###獲得個樣本對應(yīng)的疾病類型分組

> library(hgu95av2.db)
> ls("package:hgu95av2.db")###查看此包中的內(nèi)置數(shù)據(jù)集
[1] "hgu95av2" "hgu95av2.db" "hgu95av2_dbconn"
[4] "hgu95av2_dbfile" "hgu95av2_dbInfo" "hgu95av2_dbschema"
[7] "hgu95av2ACCNUM" "hgu95av2ALIAS2PROBE" "hgu95av2CHR"
[10] "hgu95av2CHRLENGTHS" "hgu95av2CHRLOC" "hgu95av2CHRLOCEND"
[13] "hgu95av2ENSEMBL" "hgu95av2ENSEMBL2PROBE" "hgu95av2ENTREZID"
[16] "hgu95av2ENZYME" "hgu95av2ENZYME2PROBE" "hgu95av2GENENAME"
[19] "hgu95av2GO" "hgu95av2GO2ALLPROBES" "hgu95av2GO2PROBE"
[22] "hgu95av2MAP" "hgu95av2MAPCOUNTS" "hgu95av2OMIM"
[25] "hgu95av2ORGANISM" "hgu95av2ORGPKG" "hgu95av2PATH"
[28] "hgu95av2PATH2PROBE" "hgu95av2PFAM" "hgu95av2PMID"
[31] "hgu95av2PMID2PROBE" "hgu95av2PROSITE" "hgu95av2REFSEQ"
[34] "hgu95av2SYMBOL" "hgu95av2UNIGENE" "hgu95av2UNIPROT"
> ids <- toTable(hgu95av2SYMBOL)
> head(ids)
probe_id symbol
1 1000_at MAPK3
2 1001_at TIE1
3 1002_f_at CYP2C19
4 1003_s_at CXCR5
5 1004_at CXCR5
6 1005_at DUSP1
ID轉(zhuǎn)換:在ids文件中,通過基因名(symbol)“TP53”查找到相應(yīng)的“probe_id”

image.png
畫圖
> boxplot(exprSet['1939_at',] ~ pd$Disease)
> boxplot(exprSet['1974_s_at',] ~ pd$Disease)
> boxplot(exprSet['31618_at',] ~ pd$Disease)

image.png

image.png

image.png
作業(yè) 4
找到BRCA1基因在TCGA數(shù)據(jù)庫的乳腺癌數(shù)據(jù)集(Breast Invasive Carcinoma (TCGA, PanCancer Atlas))的表達(dá)情況
提示:使用http://www.cbioportal.org/index.do 定位數(shù)據(jù)集:http://www.cbioportal.org/datasets
作業(yè)5
找到TP53基因在TCGA數(shù)據(jù)庫的乳腺癌數(shù)據(jù)集的表達(dá)量分組看其是否影響生存
step1 下載數(shù)據(jù)
進(jìn)入網(wǎng)站http://www.oncolnc.org/,輸入你想要的基因

image.png
找到你想要的癌癥類型,點擊進(jìn)入

image.png
輸入高、低兩個數(shù)值

image.png
點擊[click here],進(jìn)行下載

image.png
將其保存為CSV格式
step2 讀入文件
> b <- read.csv(file = 'BRCA_7157_50_50.csv',header = T,sep = ',')
step3 ggbetweenstats 作圖
> library(ggstatsplot)
> ?ggstatsplot
> colnames(b)
[1] "Patient" "Days" "Status" "Expression" "Group"
> ggbetweenstats(data = b,x =Group,y =Expression)
t is large; approximation invoked.
Note: Shapiro-Wilk Normality Test for Expression : p-value = < 0.001
Note: Bartlett's test for homogeneity of variances for factor Group : p-value = < 0.001
Warning messages:
1: In stats::pt(q = tvalue, df = dfvalue, ncp = x) :
full precision may not have been achieved in 'pnt{final}'
2: In stats::pt(q = tvalue, df = dfvalue, ncp = x) :
full precision may not have been achieved in 'pnt{final}'
3: In stats::pt(q = tvalue, df = dfvalue, ncp = x) :
full precision may not have been achieved in 'pnt{final}'

image.png
step4 生存曲線繪制
> library(ggplot2)
> library(survival)
> ?survival
> library(survminer)
> table(b$Status)###table函數(shù):檢測數(shù)據(jù)中各分組的個數(shù)
Alive Dead
871 135
> b$Status <- ifelse(b$Status=='Dead',1,0)###檢測一個向量中符合某一要求的元素,并讓其返回一個值
> ?ifelse
> table(b$Status)
0 1
871 135
> ?survfit
> sfit <- survfit(Surv(Days,Status)~Group,data=b)###survfit繪制生存曲線
> summary(sfit)
Call: survfit(formula = Surv(Days, Status) ~ Group, data = b)
Group=High
time n.risk n.event survival std.err lower 95% CI upper 95% CI
116 469 1 0.998 0.00213 0.994 1.000
174 463 1 0.996 0.00303 0.990 1.000
224 455 1 0.994 0.00373 0.986 1.000
239 452 1 0.991 0.00432 0.983 1.000
255 450 1 0.989 0.00484 0.980 0.999
302 442 1 0.987 0.00532 0.977 0.997
320 439 1 0.985 0.00576 0.973 0.996
.......
> ?summary
> ?ggsurvplot###使用ggplot2繪制生存曲線
> ggsurvplot(sfit,conf.int = F,pval = TRUE)

image.png