1、打開 Rstudio 告訴我它的工作目錄
命令為getwd(), [1] "D:/R_data/R_exercise"
如果更改工作目錄可以 用setwd()命令。
2、新建6個向量,基于不同的原子類型。(重點(diǎn)是字符串,數(shù)值,邏輯值)
c(1,2,3,3)
c("a","b","c","d")
3
TRUE
FALSE
3>5
5>3
> c(1,2,3,3)
[1] 1 2 3 3
c("a","b","c","d")
[1] "a" "b" "c" "d"
3
[1] 3
TRUE
[1] TRUE
FALSE
[1] FALSE
3>5
[1] FALSE
5>3
[1] TRUE
3、typeof() or class()判斷數(shù)據(jù)類型,將要判斷的內(nèi)容放進(jìn)括號
typeof(3)
typeof("a")
typeof(3>5)
class(3)
class("a")
class(3>5)
> typeof(3)
[1] "double"
typeof("a")
[1] "character"
typeof(3>5)
[1] "logical"
class(3)
[1] "numeric"
class("a")
[1] "character"
class(3>5)
[1] "logical"
4、as族函數(shù)實現(xiàn)數(shù)據(jù)類型之間的轉(zhuǎn)換
as.numeric("a")
as.logical(3)
as.character(3)
as.character(3>5)
as.numeric(3>5)
as.numeric(5>3)
> as.numeric("a")
[1] NA
Warning message:
NAs introduced by coercion
as.logical(3)
[1] TRUE
as.character(3)
[1] "3"
as.character(3>5)
[1] "FALSE"
as.numeric(3>5)
[1] 0
as.numeric(5>3)
[1] 1
5、is族函數(shù),判斷,返回值為TRUE或FALSE
is.numeric("a")
is.logical(3)
is.character("3")
is.logical(5>3)
> is.numeric("a")
[1] FALSE
is.logical(3)
[1] FALSE
is.character("3")
[1] TRUE
is.logical(5>3)
[1] TRUE
6、向量是如何生成的
1.用c()結(jié)合起來c(2,8,9,10,9) c("ao","fe","d","b")
2.連續(xù)的數(shù)字用冒號“:” 1:8
3.有重復(fù)的用rep(),有規(guī)律的序列用seq(),隨機(jī)數(shù)用rnorm
rep("gene",times=10) #重復(fù)10次
seq(from=3, to=21,by=3) #3-21,間隔為3
rnorm(n=5,mean=3, sd=5) # 隨機(jī)數(shù)標(biāo)準(zhǔn)差
4.通過組合更復(fù)雜的向量
paste0(rep("gene", times=10), 1:10)
paste0(rep("RNA", times=10), 1:15)
paste0(rep("DNA", times=15), 1:10) 循環(huán)補(bǔ)齊以長的為準(zhǔn),短的循環(huán)
一看效果,二改輸入,三看幫助文檔
> c(2,8,9,10,9)
[1] 2 8 9 10 9
> c("ao","fe","d","b")
[1] "ao" "fe" "d" "b"
1:8
[1] 1 2 3 4 5 6 7 8
rep("gene",times=10)
[1] "gene" "gene" "gene" "gene" "gene" "gene" "gene" "gene" "gene" "gene"
seq(from=3, to=21,by=3)
[1] 3 6 9 12 15 18 21
rnorm(n=5,mean=3, sd=5)
[1] 4.411656 -5.929244 1.181725 4.919650 3.256978
paste0(rep("gene", times=10), 1:10)
[1] "gene1" "gene2" "gene3" "gene4" "gene5" "gene6" "gene7" "gene8"
[9] "gene9" "gene10"
paste0(rep("RNA", times=10), 1:15)
[1] "RNA1" "RNA2" "RNA3" "RNA4" "RNA5" "RNA6" "RNA7" "RNA8" "RNA9"
[10] "RNA10" "RNA11" "RNA12" "RNA13" "RNA14" "RNA15"
paste0(rep("DNA", times=15), 1:10)
[1] "DNA1" "DNA2" "DNA3" "DNA4" "DNA5" "DNA6" "DNA7" "DNA8" "DNA9"
[10] "DNA10" "DNA1" "DNA2" "DNA3" "DNA4" "DNA5"
7、對兩個向量的操作
x <- c(1,3,5,6,8)
y <- c(3,2,5)
x+y #循環(huán)補(bǔ)齊
x == y
paste(x,y,sep = "/")
> x <- c(1,3,5,6,8)
> y <- c(3,2,5)
> x+y
[1] 4 5 10 9 10
Warning message:In x + y : longer object length is not a multiple of shorter object length
> x == y
[1] FALSE FALSE TRUE FALSE FALSE
Warning message:
In x == y : longer object length is not a multiple of shorter object length
paste(x,y,sep = "/")
[1] "1/3" "3/2" "5/5" "6/3" "8/2"
8、認(rèn)識數(shù)據(jù)框、矩陣和列表
- Vector 向量 — 一維
- matrix 矩陣 — 二維 多個長短相同、數(shù)據(jù)類型相同的向量
- data.frame 數(shù)據(jù)框 多個長短相同、數(shù)據(jù)類型可以不同的向量
- List 列表:數(shù)據(jù)長短可以不同、數(shù)據(jù)類型可以不同
如果想知道數(shù)據(jù)集具體是什么類型,只需要class一下。
9、在你新建的數(shù)據(jù)框進(jìn)行切片操作,比如首先取第1,3行, 然后取第4,6列
df <- data.frame(gene = paste0("gene",1:3),
sample = paste0("sample",1:3),
exp = c(32,34,45),
p = c(0.001,0.05,0.1),
level = c(2,1,4),
change = c(4,2,1))
dim(df)
nrow(df)
ncol(df)
df[c(1,3),]
df[,c(4,6)]
df <- data.frame(gene = paste0("gene",1:3),
sample = paste0("sample",1:3),
exp = c(32,34,45),
p = c(0.001,0.05,0.1),
level = c(2,1,4),
change = c(4,2,1))
dim(df)
[1] 3 6
nrow(df)
[1] 3
ncol(df)
[1] 6
df[c(1,3),]
gene sample exp p level change
1 gene1 sample1 32 0.001 2 4
3 gene3 sample3 45 0.100 4 1
df[,c(4,6)]
p change
1 0.001 4
2 0.050 2
3 0.100 1
10、使用data函數(shù)來加載R內(nèi)置數(shù)據(jù)集 rivers 描述它。
data("rivers")
rivers
length(rivers)
unique(rivers)
sort(rivers)
length(unique(rivers))
range(rivers)
which.max(rivers)
11 、下載 https://www.ncbi.nlm.nih.gov/sra?term=SRP133642 里面的 RunInfo Table 文件讀入到R里面,了解這個數(shù)據(jù)框,多少列,每一列都是什么屬性的元素。
df <- read.table(file = "c:/Users/Administrator/Desktop/SraRunTable.txt", sep = "\t", header = T, stringsAsFactors = F)
df
ncol(df)
nrow(df)
colnames(df)
下載 [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE111229](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE111229) 里面的`樣本信息sample.csv`讀入到R里面,了解這個數(shù)據(jù)框,多少列,每一列都是什么屬性的元素。
df1 <- read.table(file = "c:/Users/Administrator/Desktop/sample.csv", sep = ",", header = T, stringsAsFactors = F)
df1
ncol(df1)
nrow(df1)
colnames(df1)
把前面兩個步驟的兩個表(RunInfo Table 文件,樣本信息sample.csv)關(guān)聯(lián)起來,使用merge函數(shù)。
m <- merge(df,df1,by.x="Sample_Name", by.y = "Accession")
運(yùn)行結(jié)果內(nèi)容太多了,這里不再列出。
12、基于下午的統(tǒng)計可視化。。。。。。
df <- read.table(file = "c:/Users/Administrator/Desktop/SraRunTable.txt", sep = "\t", header = T, stringsAsFactors = F)
df
ncol(df)
nrow(df)
colnames(df)
df1 <- read.table(file = "c:/Users/Administrator/Desktop/sample.csv", sep = ",", header = T, stringsAsFactors = F)
df1
ncol(df1)
nrow(df1)
colnames(df1)
m <- merge(df,df1,by.x="Sample_Name", by.y = "Accession")
m3 <- df[,"MBases"]
e <- m[c("MBases","Title")]
boxplot(m4)
fivenum(m3)
hist(m3)
plot(density(m3))
density(m3)
class(m3)
save(e,file = 'input.Rdata')
rm(list = ls())
options(stringsAsFactors = F)
load(file = 'input.Rdata')
e[,2]
plate <- unlist(lapply(e[,2],function(x){
# x=e[1,2]
x
strsplit(x,'_')[[1]][3]
}))
c <- lapply(e[,2],function(x){
# x=e[1,2]
x
strsplit(x,'_')[[1]][3]
})
table(plate)
boxplot(e[,1]~plate)
t.test(e[,1]~plate)
e$plate=plate
library(ggplot2)
colnames(e)
ggplot(e,aes(x=plate,y=MBases))+geom_boxplot()
library(ggpubr)
p <- ggboxplot(e, x = "plate", y = "MBases",
color = "plate", palette = "jco",
add = "jitter")
# Add p-value
p + stat_compare_means(method = 't.test')