我的第一篇簡書筆記,就從R語言的入門習(xí)題開始~
今天做了Jimmy老師的R語言初級練習(xí)題,還沒有全部寫完,打算分兩次完成。題目的來源是http://www.bio-info-trainee.com/3793.html。除了學(xué)習(xí)Jimmy老師的B站視頻和《R語言實戰(zhàn)》的書本以外,我加入了一點點自己摸索的過程。相比起完成規(guī)定工作,或許在報錯的邊緣試探能夠有助于強化我的記憶。生信路漫漫,跟對了人最重要,真的非常感謝Jimmy老師的熱情關(guān)照~~ 小萌新今后將要不懼挫折,不懈努力!
下面就是我的作業(yè)內(nèi)容了:
工作目錄
> getwd() #返回值為當(dāng)前工作目錄
[1] "E:/My_Program/R_Start"
向量
character <- c("abc","def","ghi")
numeric <- c(1,-2,3)
logical <- c(F,T,T)
complex <- c(1+2i,2i)
num1 <- 2:4
num2 <- seq(2.5,3.5, by=0.5)#等差數(shù)列
num3 <- rep(c(1,3), each=2) #對元素逐一重復(fù)
num4 <- rep(1:2, times=2) #對向量重復(fù)
矩陣
matrix_a <- matrix(1:6, nrow=2, ncol=3, byrow=TRUE)
數(shù)組
dim1 <- c("A1", "A2")
dim2 <- c("B1", "B2", "B3")
dim3 <- c("C1", "C2", "C3", "C4")
z <- array(1:24, c(2,3,4), dimnames=list(dim1, dim2, dim3))
#dimnames是各維度的標(biāo)簽構(gòu)成的列表
若不加標(biāo)簽
> z <- array(1:24, c(2,3,4))
> z
, , 1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
, , 2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
, , 3
[,1] [,2] [,3]
[1,] 13 15 17
[2,] 14 16 18
, , 4
[,1] [,2] [,3]
[1,] 19 21 23
[2,] 20 22 24
數(shù)據(jù)框
> col1 <- c(1,2,3)
> col2 <- c("a","b","c")
> df <- data.frame(col1,col2) #用等長的向量作為列來創(chuàng)建數(shù)據(jù)框,向量的類型可以不同
> df
col1 col2
1 1 a
2 2 b
3 3 c
- 幾種對數(shù)據(jù)框切片的方法
> df$col1 #用$符號取值,結(jié)果為向量
[1] 1 2 3
> df_col1 <- df$col1
> str(df_col1)
num [1:3] 1 2 3
> df[1] #而用[]切片,結(jié)果為數(shù)據(jù)框
col1
1 1
2 2
3 3
> df["col1"]
col1
1 1
2 2
3 3
> df_1 <- df[1]
> str(df_1)
'data.frame': 3 obs. of 1 variable:
$ col1: num 1 2 3
> df_col1 <- df["col1"]
> str(df_col1)
'data.frame': 3 obs. of 1 variable:
$ col1: num 1 2 3
> df[,1] #用[ ,y]按列切片。第一列切出來是向量
[1] 1 2 3
> str(df[,1])
num [1:3] 1 2 3
> df[,2] #第二列是字符型的,切出來是因子
[1] a b c
Levels: a b c #如果要保留向量的話,創(chuàng)建數(shù)據(jù)框的時候加上StringsAsFactors=F
> str(df[,2])
Factor w/ 3 levels "a","b","c": 1 2 3
> df[1,1] #用[x,y]可以取第x行第y列的元素
[1] 1
> df[1,2]
[1] a
Levels: a b c #字符也會變成因子
> str(df[2,]) #按行切片的話,由于數(shù)據(jù)類型不一樣,得到的仍是數(shù)據(jù)框
'data.frame': 1 obs. of 2 variables:
$ col1: num 2
$ col2: Factor w/ 3 levels "a","b","c": 2
對于按行切得的數(shù)據(jù)框,還可以繼續(xù)切
> df[1,][2] #得到數(shù)據(jù)框
col2
1 a
> df[1,][,2] #得到因子
[1] a
Levels: a b c
> df[1,]$col2 #得到因子
[1] a
Levels: a b c
> df[1,][[2]] #得到因子
[1] a
Levels: a b c
在數(shù)據(jù)框里,用[[]]和[]切片似乎沒有任何區(qū)別
> df[[1]] #用[[]]取值,得到的也是向量
[1] 1 2 3
> str(df[[1]])
num [1:3] 1 2 3
> df[[1]][2] #進而可以取第一行第二列的元素
[1] 2
> str(df[[1]][2])
num 2
> df[[1,2]] #這樣取元素也可以,得到了因子
[1] a
Levels: a b c
> str(df[[1,2]])
Factor w/ 3 levels "a","b","c": 1
> df[["col1"]]
[1] 1 2 3
> df[["col2"]] #這樣也是因子
[1] a b c
Levels: a b c
> df[[2]] #同理,用下標(biāo)索引和標(biāo)簽索引結(jié)果是一樣的
[1] a b c
Levels: a b c
玩了這么多,有點偏題了,咳咳
接下來做一下作業(yè):創(chuàng)建一個數(shù)據(jù)框,做切片
> o <- 1:4
> p <- c("a","b","c","d")
> q <- 11:14
> r <- c(T,T,F,T)
> frame1 <- data.frame(o,p,q,r,stringsAsFactors = F)
> frame1
o p q r
1 1 a 11 TRUE
2 2 b 12 TRUE
3 3 c 13 FALSE
4 4 d 14 TRUE
> frame2 <- frame1[c(1,3),][,2:4]
> frame2
p q r
1 a 11 TRUE
3 c 13 FALSE
下一題
#讀入sample.csv
> df=read.csv("sample.csv")
> dim(df) #查看行列數(shù)
[1] 768 12
> colnames(df) #查看列名
[1] "Accession" "Title"
[3] "Sample.Type" "Taxonomy"
[5] "Channels" "Platform"
[7] "Series" "Supplementary.Types"
[9] "Supplementary.Links" "SRA.Accession"
[11] "Contact" "Release.Date"
> str(df)
'data.frame': 768 obs. of 12 variables: #12個列768行
$ Accession : Factor w/ 768 levels "GSM3025845","GSM3025846",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Title : Factor w/ 768 levels "SS2_15_0048_A1",..: 1 12 18 19 20 21 22 23 24 2 ...
$ Sample.Type : Factor w/ 1 level "SRA": 1 1 1 1 1 1 1 1 1 1 ...
$ Taxonomy : Factor w/ 1 level "Mus musculus": 1 1 1 1 1 1 1 1 1 1 ...
$ Channels : int 1 1 1 1 1 1 1 1 1 1 ...
$ Platform : Factor w/ 1 level "GPL13112": 1 1 1 1 1 1 1 1 1 1 ...
$ Series : Factor w/ 1 level "GSE111229": 1 1 1 1 1 1 1 1 1 1 ...
$ Supplementary.Types: Factor w/ 1 level "SRA Run Selector": 1 1 1 1 1 1 1 1 1 1 ...
$ Supplementary.Links: Factor w/ 768 levels "https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRX3749901",..: 2 3 4 5 6 7 8 9 10 1 ...
$ SRA.Accession : Factor w/ 768 levels "SRX3749901","SRX3749902",..: 2 3 4 5 6 7 8 9 10 1 ...
$ Contact : Factor w/ 1 level "Kristian Pietras": 1 1 1 1 1 1 1 1 1 1 ...
$ Release.Date : Factor w/ 1 level "Nov 23, 2018": 1 1 1 1 1 1 1 1 1 1 ...
#讀入SraRunTable.txt
> df1 <- read.table("SraRunTable.txt",header = TRUE, sep="\t", fill= TRUE)
> # header表示第一列是否為標(biāo)題欄,fill表示是否將空的單元格用空格填充
> str(df1)
'data.frame': 768 obs. of 31 variables:
$ BioSample : Factor w/ 768 levels "SAMN08619908",..: 5 4 3 2 1 12 11 14 13 7 ...
$ Experiment : Factor w/ 768 levels "SRX3749901","SRX3749902",..: 2 3 4 5 6 7 8 9 10 1 ...
$ MBases : int 16 16 8 8 11 7 18 5 11 15 ...
$ MBytes : int 8 8 4 4 5 4 9 3 6 8 ...
$ Run : Factor w/ 768 levels "SRR6790711","SRR6790712",..: 1 2 3 4 5 6 7 8 9 10 ...
$ SRA_Sample : Factor w/ 768 levels "SRS3006136","SRS3006137",..: 3 13 2 1 14 5 15 7 6 4 ...
$ Sample_Name : Factor w/ 768 levels "GSM3025845","GSM3025846",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Assay_Type : Factor w/ 1 level "RNA-Seq": 1 1 1 1 1 1 1 1 1 1 ...
$ AssemblyName : Factor w/ 1 level "GCF_000001635.20": 1 1 1 1 1 1 1 1 1 1 ...
$ AvgSpotLen : int 43 43 43 43 43 43 43 43 43 43 ...
$ BioProject : Factor w/ 1 level "PRJNA436229": 1 1 1 1 1 1 1 1 1 1 ...
$ Center_Name : Factor w/ 1 level "GEO": 1 1 1 1 1 1 1 1 1 1 ...
$ Consent : Factor w/ 1 level "public": 1 1 1 1 1 1 1 1 1 1 ...
$ DATASTORE_filetype: Factor w/ 1 level "sra": 1 1 1 1 1 1 1 1 1 1 ...
$ DATASTORE_provider: Factor w/ 1 level "ncbi": 1 1 1 1 1 1 1 1 1 1 ...
$ InsertSize : int 0 0 0 0 0 0 0 0 0 0 ...
$ Instrument : Factor w/ 1 level "Illumina HiSeq 2000": 1 1 1 1 1 1 1 1 1 1 ...
$ LibraryLayout : Factor w/ 1 level "SINGLE": 1 1 1 1 1 1 1 1 1 1 ...
$ LibrarySelection : Factor w/ 1 level "cDNA": 1 1 1 1 1 1 1 1 1 1 ...
$ LibrarySource : Factor w/ 1 level "TRANSCRIPTOMIC": 1 1 1 1 1 1 1 1 1 1 ...
$ LoadDate : Factor w/ 1 level "2018-03-01": 1 1 1 1 1 1 1 1 1 1 ...
$ Organism : Factor w/ 1 level "Mus musculus": 1 1 1 1 1 1 1 1 1 1 ...
$ Platform : Factor w/ 1 level "ILLUMINA": 1 1 1 1 1 1 1 1 1 1 ...
$ ReleaseDate : Factor w/ 1 level "2018-11-23": 1 1 1 1 1 1 1 1 1 1 ...
$ SRA_Study : Factor w/ 1 level "SRP133642": 1 1 1 1 1 1 1 1 1 1 ...
$ age : Factor w/ 1 level "14 weeks": 1 1 1 1 1 1 1 1 1 1 ...
$ cell_type : Factor w/ 1 level "cancer-associated fibroblasts (CAFs)": 1 1 1 1 1 1 1 1 1 1 ...
$ marker_genes : Factor w/ 1 level "EpCAM-, CD45-, CD31-, NG2-": 1 1 1 1 1 1 1 1 1 1 ...
$ source_name : Factor w/ 1 level "Mammary tumor fibroblast": 1 1 1 1 1 1 1 1 1 1 ...
$ strain : Factor w/ 1 level "FVB/N-Tg(MMTVPyVT)634Mul/J": 1 1 1 1 1 1 1 1 1 1 ...
$ tissue : Factor w/ 1 level "Mammary tumor fibroblast": 1 1 1 1 1 1 1 1 1 1 ...
#合成
> df2 <- merge(df,df1,by.x="Accession",by.y="Sample_Name") #用by將關(guān)聯(lián)的兩列對映起來
str(df2)
'data.frame': 768 obs. of 42 variables:
$ Accession : Factor w/ 768 levels "GSM3025845","GSM3025846",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Title : Factor w/ 768 levels "SS2_15_0048_A1",..: 1 12 18 19 20 21 22 23 24 2 ...
$ Sample.Type : Factor w/ 1 level "SRA": 1 1 1 1 1 1 1 1 1 1 ...
$ Taxonomy : Factor w/ 1 level "Mus musculus": 1 1 1 1 1 1 1 1 1 1 ...
$ Channels : int 1 1 1 1 1 1 1 1 1 1 ...
$ Platform.x : Factor w/ 1 level "GPL13112": 1 1 1 1 1 1 1 1 1 1 ...
$ Series : Factor w/ 1 level "GSE111229": 1 1 1 1 1 1 1 1 1 1 ...
$ Supplementary.Types: Factor w/ 1 level "SRA Run Selector": 1 1 1 1 1 1 1 1 1 1 ...
$ Supplementary.Links: Factor w/ 768 levels "https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRX3749901",..: 2 3 4 5 6 7 8 9 10 1 ...
$ SRA.Accession : Factor w/ 768 levels "SRX3749901","SRX3749902",..: 2 3 4 5 6 7 8 9 10 1 ...
$ Contact : Factor w/ 1 level "Kristian Pietras": 1 1 1 1 1 1 1 1 1 1 ...
$ Release.Date : Factor w/ 1 level "Nov 23, 2018": 1 1 1 1 1 1 1 1 1 1 ...
$ BioSample : Factor w/ 768 levels "SAMN08619908",..: 5 4 3 2 1 12 11 14 13 7 ...
$ Experiment : Factor w/ 768 levels "SRX3749901","SRX3749902",..: 2 3 4 5 6 7 8 9 10 1 ...
$ MBases : int 16 16 8 8 11 7 18 5 11 15 ...
$ MBytes : int 8 8 4 4 5 4 9 3 6 8 ...
$ Run : Factor w/ 768 levels "SRR6790711","SRR6790712",..: 1 2 3 4 5 6 7 8 9 10 ...
$ SRA_Sample : Factor w/ 768 levels "SRS3006136","SRS3006137",..: 3 13 2 1 14 5 15 7 6 4 ...
$ Assay_Type : Factor w/ 1 level "RNA-Seq": 1 1 1 1 1 1 1 1 1 1 ...
$ AssemblyName : Factor w/ 1 level "GCF_000001635.20": 1 1 1 1 1 1 1 1 1 1 ...
$ AvgSpotLen : int 43 43 43 43 43 43 43 43 43 43 ...
$ BioProject : Factor w/ 1 level "PRJNA436229": 1 1 1 1 1 1 1 1 1 1 ...
$ Center_Name : Factor w/ 1 level "GEO": 1 1 1 1 1 1 1 1 1 1 ...
$ Consent : Factor w/ 1 level "public": 1 1 1 1 1 1 1 1 1 1 ...
$ DATASTORE_filetype : Factor w/ 1 level "sra": 1 1 1 1 1 1 1 1 1 1 ...
$ DATASTORE_provider : Factor w/ 1 level "ncbi": 1 1 1 1 1 1 1 1 1 1 ...
$ InsertSize : int 0 0 0 0 0 0 0 0 0 0 ...
$ Instrument : Factor w/ 1 level "Illumina HiSeq 2000": 1 1 1 1 1 1 1 1 1 1 ...
$ LibraryLayout : Factor w/ 1 level "SINGLE": 1 1 1 1 1 1 1 1 1 1 ...
$ LibrarySelection : Factor w/ 1 level "cDNA": 1 1 1 1 1 1 1 1 1 1 ...
$ LibrarySource : Factor w/ 1 level "TRANSCRIPTOMIC": 1 1 1 1 1 1 1 1 1 1 ...
$ LoadDate : Factor w/ 1 level "2018-03-01": 1 1 1 1 1 1 1 1 1 1 ...
$ Organism : Factor w/ 1 level "Mus musculus": 1 1 1 1 1 1 1 1 1 1 ...
$ Platform.y : Factor w/ 1 level "ILLUMINA": 1 1 1 1 1 1 1 1 1 1 ...
$ ReleaseDate : Factor w/ 1 level "2018-11-23": 1 1 1 1 1 1 1 1 1 1 ...
$ SRA_Study : Factor w/ 1 level "SRP133642": 1 1 1 1 1 1 1 1 1 1 ...
$ age : Factor w/ 1 level "14 weeks": 1 1 1 1 1 1 1 1 1 1 ...
$ cell_type : Factor w/ 1 level "cancer-associated fibroblasts (CAFs)": 1 1 1 1 1 1 1 1 1 1 ...
$ marker_genes : Factor w/ 1 level "EpCAM-, CD45-, CD31-, NG2-": 1 1 1 1 1 1 1 1 1 1 ...
$ source_name : Factor w/ 1 level "Mammary tumor fibroblast": 1 1 1 1 1 1 1 1 1 1 ...
$ strain : Factor w/ 1 level "FVB/N-Tg(MMTVPyVT)634Mul/J": 1 1 1 1 1 1 1 1 1 1 ...
$ tissue : Factor w/ 1 level "Mammary tumor fibroblast": 1 1 1 1 1 1 1 1 1 1 ...
R語言是基本功,我想走得扎實一些,所以每次學(xué)的內(nèi)容不是太多。今天就先做這些啦,下次繼續(xù)~