Steven的R語言初級作業(yè)

我的第一篇簡書筆記,就從R語言的入門習(xí)題開始~

今天做了Jimmy老師的R語言初級練習(xí)題,還沒有全部寫完,打算分兩次完成。題目的來源是http://www.bio-info-trainee.com/3793.html。除了學(xué)習(xí)Jimmy老師的B站視頻和《R語言實戰(zhàn)》的書本以外,我加入了一點點自己摸索的過程。相比起完成規(guī)定工作,或許在報錯的邊緣試探能夠有助于強化我的記憶。生信路漫漫,跟對了人最重要,真的非常感謝Jimmy老師的熱情關(guān)照~~ 小萌新今后將要不懼挫折,不懈努力!

下面就是我的作業(yè)內(nèi)容了:


工作目錄

> getwd()    #返回值為當(dāng)前工作目錄
[1] "E:/My_Program/R_Start"

向量

character <- c("abc","def","ghi")
numeric <- c(1,-2,3)
logical <- c(F,T,T)
complex <- c(1+2i,2i)
num1 <- 2:4
num2 <- seq(2.5,3.5, by=0.5)#等差數(shù)列
num3 <- rep(c(1,3), each=2) #對元素逐一重復(fù)
num4 <- rep(1:2, times=2)   #對向量重復(fù)

矩陣

matrix_a <- matrix(1:6, nrow=2, ncol=3, byrow=TRUE)

數(shù)組

dim1 <- c("A1", "A2")
dim2 <- c("B1", "B2", "B3")
dim3 <- c("C1", "C2", "C3", "C4")
z <- array(1:24, c(2,3,4), dimnames=list(dim1, dim2, dim3))  
#dimnames是各維度的標(biāo)簽構(gòu)成的列表
若不加標(biāo)簽
> z <- array(1:24, c(2,3,4))
> z
, , 1

     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

, , 2

     [,1] [,2] [,3]
[1,]    7    9   11
[2,]    8   10   12

, , 3

     [,1] [,2] [,3]
[1,]   13   15   17
[2,]   14   16   18

, , 4

     [,1] [,2] [,3]
[1,]   19   21   23
[2,]   20   22   24

數(shù)據(jù)框

> col1 <- c(1,2,3)
> col2 <- c("a","b","c")
> df <- data.frame(col1,col2)    #用等長的向量作為列來創(chuàng)建數(shù)據(jù)框,向量的類型可以不同
> df
  col1 col2
1    1    a
2    2    b
3    3    c
  • 幾種對數(shù)據(jù)框切片的方法
> df$col1          #用$符號取值,結(jié)果為向量
[1] 1 2 3
> df_col1 <- df$col1
> str(df_col1)
 num [1:3] 1 2 3
> df[1]            #而用[]切片,結(jié)果為數(shù)據(jù)框
  col1
1    1
2    2
3    3
> df["col1"]
  col1
1    1
2    2
3    3

> df_1 <- df[1]
> str(df_1)
'data.frame':   3 obs. of  1 variable:
 $ col1: num  1 2 3

> df_col1 <- df["col1"]
> str(df_col1)
'data.frame':   3 obs. of  1 variable:
 $ col1: num  1 2 3
> df[,1]             #用[ ,y]按列切片。第一列切出來是向量
[1] 1 2 3
> str(df[,1])
 num [1:3] 1 2 3
> df[,2]             #第二列是字符型的,切出來是因子
[1] a b c
Levels: a b c        #如果要保留向量的話,創(chuàng)建數(shù)據(jù)框的時候加上StringsAsFactors=F
> str(df[,2])
 Factor w/ 3 levels "a","b","c": 1 2 3
 
> df[1,1]            #用[x,y]可以取第x行第y列的元素
[1] 1
> df[1,2]
[1] a
Levels: a b c        #字符也會變成因子

> str(df[2,])        #按行切片的話,由于數(shù)據(jù)類型不一樣,得到的仍是數(shù)據(jù)框
'data.frame':   1 obs. of  2 variables:
 $ col1: num 2
 $ col2: Factor w/ 3 levels "a","b","c": 2
 
對于按行切得的數(shù)據(jù)框,還可以繼續(xù)切
> df[1,][2]            #得到數(shù)據(jù)框
  col2
1    a
> df[1,][,2]           #得到因子
[1] a
Levels: a b c
> df[1,]$col2          #得到因子
[1] a
Levels: a b c
> df[1,][[2]]          #得到因子
[1] a
Levels: a b c
在數(shù)據(jù)框里,用[[]]和[]切片似乎沒有任何區(qū)別
> df[[1]]            #用[[]]取值,得到的也是向量
[1] 1 2 3
> str(df[[1]])
 num [1:3] 1 2 3
 
> df[[1]][2]         #進而可以取第一行第二列的元素
[1] 2
> str(df[[1]][2])
 num 2
> df[[1,2]]          #這樣取元素也可以,得到了因子
[1] a
Levels: a b c
> str(df[[1,2]])
 Factor w/ 3 levels "a","b","c": 1
 
> df[["col1"]]
[1] 1 2 3
> df[["col2"]]       #這樣也是因子
[1] a b c
Levels: a b c
> df[[2]]            #同理,用下標(biāo)索引和標(biāo)簽索引結(jié)果是一樣的
[1] a b c
Levels: a b c
玩了這么多,有點偏題了,咳咳
接下來做一下作業(yè):創(chuàng)建一個數(shù)據(jù)框,做切片
> o <- 1:4
> p <- c("a","b","c","d")
> q <- 11:14
> r <- c(T,T,F,T)
> frame1 <- data.frame(o,p,q,r,stringsAsFactors = F)
> frame1
  o p  q     r
1 1 a 11  TRUE
2 2 b 12  TRUE
3 3 c 13 FALSE
4 4 d 14  TRUE
> frame2 <- frame1[c(1,3),][,2:4]
> frame2
  p  q     r
1 a 11  TRUE
3 c 13 FALSE

下一題

#讀入sample.csv
> df=read.csv("sample.csv")
> dim(df)                 #查看行列數(shù)
[1] 768  12
> colnames(df)            #查看列名
 [1] "Accession"           "Title"              
 [3] "Sample.Type"         "Taxonomy"           
 [5] "Channels"            "Platform"           
 [7] "Series"              "Supplementary.Types"
 [9] "Supplementary.Links" "SRA.Accession"      
[11] "Contact"             "Release.Date"    
> str(df)
'data.frame':   768 obs. of  12 variables:       #12個列768行
 $ Accession          : Factor w/ 768 levels "GSM3025845","GSM3025846",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Title              : Factor w/ 768 levels "SS2_15_0048_A1",..: 1 12 18 19 20 21 22 23 24 2 ...
 $ Sample.Type        : Factor w/ 1 level "SRA": 1 1 1 1 1 1 1 1 1 1 ...
 $ Taxonomy           : Factor w/ 1 level "Mus musculus": 1 1 1 1 1 1 1 1 1 1 ...
 $ Channels           : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Platform           : Factor w/ 1 level "GPL13112": 1 1 1 1 1 1 1 1 1 1 ...
 $ Series             : Factor w/ 1 level "GSE111229": 1 1 1 1 1 1 1 1 1 1 ...
 $ Supplementary.Types: Factor w/ 1 level "SRA Run Selector": 1 1 1 1 1 1 1 1 1 1 ...
 $ Supplementary.Links: Factor w/ 768 levels "https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRX3749901",..: 2 3 4 5 6 7 8 9 10 1 ...
 $ SRA.Accession      : Factor w/ 768 levels "SRX3749901","SRX3749902",..: 2 3 4 5 6 7 8 9 10 1 ...
 $ Contact            : Factor w/ 1 level "Kristian Pietras": 1 1 1 1 1 1 1 1 1 1 ...
 $ Release.Date       : Factor w/ 1 level "Nov 23, 2018": 1 1 1 1 1 1 1 1 1 1 ...

#讀入SraRunTable.txt
> df1 <- read.table("SraRunTable.txt",header = TRUE, sep="\t", fill= TRUE)
> # header表示第一列是否為標(biāo)題欄,fill表示是否將空的單元格用空格填充
> str(df1)
'data.frame':   768 obs. of  31 variables:
 $ BioSample         : Factor w/ 768 levels "SAMN08619908",..: 5 4 3 2 1 12 11 14 13 7 ...
 $ Experiment        : Factor w/ 768 levels "SRX3749901","SRX3749902",..: 2 3 4 5 6 7 8 9 10 1 ...
 $ MBases            : int  16 16 8 8 11 7 18 5 11 15 ...
 $ MBytes            : int  8 8 4 4 5 4 9 3 6 8 ...
 $ Run               : Factor w/ 768 levels "SRR6790711","SRR6790712",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ SRA_Sample        : Factor w/ 768 levels "SRS3006136","SRS3006137",..: 3 13 2 1 14 5 15 7 6 4 ...
 $ Sample_Name       : Factor w/ 768 levels "GSM3025845","GSM3025846",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Assay_Type        : Factor w/ 1 level "RNA-Seq": 1 1 1 1 1 1 1 1 1 1 ...
 $ AssemblyName      : Factor w/ 1 level "GCF_000001635.20": 1 1 1 1 1 1 1 1 1 1 ...
 $ AvgSpotLen        : int  43 43 43 43 43 43 43 43 43 43 ...
 $ BioProject        : Factor w/ 1 level "PRJNA436229": 1 1 1 1 1 1 1 1 1 1 ...
 $ Center_Name       : Factor w/ 1 level "GEO": 1 1 1 1 1 1 1 1 1 1 ...
 $ Consent           : Factor w/ 1 level "public": 1 1 1 1 1 1 1 1 1 1 ...
 $ DATASTORE_filetype: Factor w/ 1 level "sra": 1 1 1 1 1 1 1 1 1 1 ...
 $ DATASTORE_provider: Factor w/ 1 level "ncbi": 1 1 1 1 1 1 1 1 1 1 ...
 $ InsertSize        : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Instrument        : Factor w/ 1 level "Illumina HiSeq 2000": 1 1 1 1 1 1 1 1 1 1 ...
 $ LibraryLayout     : Factor w/ 1 level "SINGLE": 1 1 1 1 1 1 1 1 1 1 ...
 $ LibrarySelection  : Factor w/ 1 level "cDNA": 1 1 1 1 1 1 1 1 1 1 ...
 $ LibrarySource     : Factor w/ 1 level "TRANSCRIPTOMIC": 1 1 1 1 1 1 1 1 1 1 ...
 $ LoadDate          : Factor w/ 1 level "2018-03-01": 1 1 1 1 1 1 1 1 1 1 ...
 $ Organism          : Factor w/ 1 level "Mus musculus": 1 1 1 1 1 1 1 1 1 1 ...
 $ Platform          : Factor w/ 1 level "ILLUMINA": 1 1 1 1 1 1 1 1 1 1 ...
 $ ReleaseDate       : Factor w/ 1 level "2018-11-23": 1 1 1 1 1 1 1 1 1 1 ...
 $ SRA_Study         : Factor w/ 1 level "SRP133642": 1 1 1 1 1 1 1 1 1 1 ...
 $ age               : Factor w/ 1 level "14 weeks": 1 1 1 1 1 1 1 1 1 1 ...
 $ cell_type         : Factor w/ 1 level "cancer-associated fibroblasts (CAFs)": 1 1 1 1 1 1 1 1 1 1 ...
 $ marker_genes      : Factor w/ 1 level "EpCAM-, CD45-, CD31-, NG2-": 1 1 1 1 1 1 1 1 1 1 ...
 $ source_name       : Factor w/ 1 level "Mammary tumor fibroblast": 1 1 1 1 1 1 1 1 1 1 ...
 $ strain            : Factor w/ 1 level "FVB/N-Tg(MMTVPyVT)634Mul/J": 1 1 1 1 1 1 1 1 1 1 ...
 $ tissue            : Factor w/ 1 level "Mammary tumor fibroblast": 1 1 1 1 1 1 1 1 1 1 ...
 
 #合成
> df2 <- merge(df,df1,by.x="Accession",by.y="Sample_Name")    #用by將關(guān)聯(lián)的兩列對映起來
str(df2)
'data.frame':   768 obs. of  42 variables:
 $ Accession          : Factor w/ 768 levels "GSM3025845","GSM3025846",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Title              : Factor w/ 768 levels "SS2_15_0048_A1",..: 1 12 18 19 20 21 22 23 24 2 ...
 $ Sample.Type        : Factor w/ 1 level "SRA": 1 1 1 1 1 1 1 1 1 1 ...
 $ Taxonomy           : Factor w/ 1 level "Mus musculus": 1 1 1 1 1 1 1 1 1 1 ...
 $ Channels           : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Platform.x         : Factor w/ 1 level "GPL13112": 1 1 1 1 1 1 1 1 1 1 ...
 $ Series             : Factor w/ 1 level "GSE111229": 1 1 1 1 1 1 1 1 1 1 ...
 $ Supplementary.Types: Factor w/ 1 level "SRA Run Selector": 1 1 1 1 1 1 1 1 1 1 ...
 $ Supplementary.Links: Factor w/ 768 levels "https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRX3749901",..: 2 3 4 5 6 7 8 9 10 1 ...
 $ SRA.Accession      : Factor w/ 768 levels "SRX3749901","SRX3749902",..: 2 3 4 5 6 7 8 9 10 1 ...
 $ Contact            : Factor w/ 1 level "Kristian Pietras": 1 1 1 1 1 1 1 1 1 1 ...
 $ Release.Date       : Factor w/ 1 level "Nov 23, 2018": 1 1 1 1 1 1 1 1 1 1 ...
 $ BioSample          : Factor w/ 768 levels "SAMN08619908",..: 5 4 3 2 1 12 11 14 13 7 ...
 $ Experiment         : Factor w/ 768 levels "SRX3749901","SRX3749902",..: 2 3 4 5 6 7 8 9 10 1 ...
 $ MBases             : int  16 16 8 8 11 7 18 5 11 15 ...
 $ MBytes             : int  8 8 4 4 5 4 9 3 6 8 ...
 $ Run                : Factor w/ 768 levels "SRR6790711","SRR6790712",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ SRA_Sample         : Factor w/ 768 levels "SRS3006136","SRS3006137",..: 3 13 2 1 14 5 15 7 6 4 ...
 $ Assay_Type         : Factor w/ 1 level "RNA-Seq": 1 1 1 1 1 1 1 1 1 1 ...
 $ AssemblyName       : Factor w/ 1 level "GCF_000001635.20": 1 1 1 1 1 1 1 1 1 1 ...
 $ AvgSpotLen         : int  43 43 43 43 43 43 43 43 43 43 ...
 $ BioProject         : Factor w/ 1 level "PRJNA436229": 1 1 1 1 1 1 1 1 1 1 ...
 $ Center_Name        : Factor w/ 1 level "GEO": 1 1 1 1 1 1 1 1 1 1 ...
 $ Consent            : Factor w/ 1 level "public": 1 1 1 1 1 1 1 1 1 1 ...
 $ DATASTORE_filetype : Factor w/ 1 level "sra": 1 1 1 1 1 1 1 1 1 1 ...
 $ DATASTORE_provider : Factor w/ 1 level "ncbi": 1 1 1 1 1 1 1 1 1 1 ...
 $ InsertSize         : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Instrument         : Factor w/ 1 level "Illumina HiSeq 2000": 1 1 1 1 1 1 1 1 1 1 ...
 $ LibraryLayout      : Factor w/ 1 level "SINGLE": 1 1 1 1 1 1 1 1 1 1 ...
 $ LibrarySelection   : Factor w/ 1 level "cDNA": 1 1 1 1 1 1 1 1 1 1 ...
 $ LibrarySource      : Factor w/ 1 level "TRANSCRIPTOMIC": 1 1 1 1 1 1 1 1 1 1 ...
 $ LoadDate           : Factor w/ 1 level "2018-03-01": 1 1 1 1 1 1 1 1 1 1 ...
 $ Organism           : Factor w/ 1 level "Mus musculus": 1 1 1 1 1 1 1 1 1 1 ...
 $ Platform.y         : Factor w/ 1 level "ILLUMINA": 1 1 1 1 1 1 1 1 1 1 ...
 $ ReleaseDate        : Factor w/ 1 level "2018-11-23": 1 1 1 1 1 1 1 1 1 1 ...
 $ SRA_Study          : Factor w/ 1 level "SRP133642": 1 1 1 1 1 1 1 1 1 1 ...
 $ age                : Factor w/ 1 level "14 weeks": 1 1 1 1 1 1 1 1 1 1 ...
 $ cell_type          : Factor w/ 1 level "cancer-associated fibroblasts (CAFs)": 1 1 1 1 1 1 1 1 1 1 ...
 $ marker_genes       : Factor w/ 1 level "EpCAM-, CD45-, CD31-, NG2-": 1 1 1 1 1 1 1 1 1 1 ...
 $ source_name        : Factor w/ 1 level "Mammary tumor fibroblast": 1 1 1 1 1 1 1 1 1 1 ...
 $ strain             : Factor w/ 1 level "FVB/N-Tg(MMTVPyVT)634Mul/J": 1 1 1 1 1 1 1 1 1 1 ...
 $ tissue             : Factor w/ 1 level "Mammary tumor fibroblast": 1 1 1 1 1 1 1 1 1 1 ...

R語言是基本功,我想走得扎實一些,所以每次學(xué)的內(nèi)容不是太多。今天就先做這些啦,下次繼續(xù)~

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容