1.常用可視化R包
- 作圖
- base
- ggplot2
- ggpubr
- 拼圖
- par里mfrow
- grid.arrage
- cowplot
- customLayout
- patchwork
- 導出
- pdf()等三段論
- ggsave
- eoffice
- topptx
2.基礎包-繪圖函數(shù)
高級繪圖函數(shù)
plot() #繪制散點圖等多種圖形,根據(jù)數(shù)據(jù)的類型,調(diào)用相應的函數(shù)繪圖
hist() #頻率直方圖
boxplot() #箱線圖
stripchart() #點圖
barplot() #柱狀圖
dotplot() #點圖
piechart() #餅圖
matplot() #數(shù)字圖形
低級繪圖函數(shù)
lines() #添加線
curve() #添加曲線
abline() #添加給定斜率的線
points() #添加點
segments() #折線
arrows() #箭頭
axis() #坐標軸
box() #外框
title() #標題
text() #文字
mtext() #圖邊文字
繪圖參數(shù)
#參數(shù)用在函數(shù)內(nèi)部,在沒有設定值時使用默認值。
font=字體
lty=線類型
lwd=線寬度
pch=點的類型
xlab=橫坐標
ylab=縱坐標
xlim = 橫坐標范圍
ylim=縱坐標范圍
也可以對整個要繪制圖形的各種參數(shù)進行設定
參見par()
3.gglot2語法
1.入門級繪圖模板
ggplot(data = <DATA>) +
<GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))
ggplot(data=iris)+
geom_point(mapping=aes(x=Sepal.Length,
y=Petal.Length))
2.映射-顏色、大小、透明度、形狀
| 屬性 | 參數(shù) |
|---|---|
| x軸 | x |
| y軸 | y |
| 顏色 | color |
| 大小 | size |
| 形狀 | shape |
| 透明度 | alpha |
| 填充顏色 | fill |
- 圖中加粗為可手動設置的參數(shù)
- 手動設置需要設為有意義的值
- 顏色:字符串,blue,red等
- 大?。簡挝籱m
- 形狀:數(shù)字編號表示
- 空心形狀 0-14 color邊框
- 實心形狀 15-20 color填充
- 填充形狀 21-24 color邊框,和fill填充

image-20201124110810962.png
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = "blue")
-
映射vs手動設置
ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy,color = class)) ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy),color = "blue")
3.分面
ggplot(data = iris) +
geom_point(mapping = aes(x = Sepal.Length, y = Petal.Length)) +
facet_wrap(~ Species)

image-20201124111102936.png
-
雙分面
test$group = sample(letters[1:5],150,replace = T) ggplot(data = test) + geom_point(mapping = aes(x = Sepal.Length, y = Petal.Length)) + facet_grid(group ~ Species)

image-20201124111222469.png
4.幾何對象
-
理解分組
ggplot(data = test) + geom_smooth(aes(x = Sepal.Length, y = Petal.Length,group = Species))

image-20201124111556014.png
ggplot(data = test) +
geom_smooth(aes(x = Sepal.Length,
y = Petal.Length,color = Species))

image-20201124111713570.png
-
幾何對象可以疊加
#局部映射 ggplot(data = test) + geom_smooth(mapping = aes(x = Sepal.Length, y = Petal.Length))+ geom_point (mapping = aes(x = Sepal.Length, y = Petal.Length)) #全局映射 ggplot(data = test, mapping = aes(x = Sepal.Length, y = Petal.Length)) + geom_smooth()+ geom_point ()

image-20201124111841885.png
- 映射
- 局部映射
- 僅對當前圖層有效
- 全局映射
- 對所有圖層有效
- 當局部映射和全局映射沖突時,以局部為準
- 局部映射
5.統(tǒng)計變換
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))
ggplot(data = diamonds) +
stat_count(mapping = aes(x = cut))
-
使用場景1:使用表中數(shù)據(jù)直接做圖,而不是統(tǒng)計
ggplot(data = fre) + geom_bar(mapping = aes(x = Var1, y = Freq), stat = "identity")

image-20201124114710331.png
-
使用場景2:不統(tǒng)計count,統(tǒng)計prop(比例)
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, y = ..prop.., group = 1))
6.位置調(diào)整
-
位置關系
- geom_point()

image-20201124115022619.png
- geom_jitter()

image-20201124115112693.png
-
堆疊直方圖
ggplot(data = diamonds) + geom_bar(mapping = aes(x =cut,fill=clarity))

image-20201124115403891.png
-
并列直方圖
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, fill =clarity), position = "dodge")

image-20201124115451819.png
7.坐標系
-
翻轉(zhuǎn)coord_flip()
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + geom_boxplot() + coord_flip()

image-20201124115622573.png
-
極坐標系coord_polar()
bar <- ggplot(data = diamonds) + geom_bar( mapping = aes(x = cut, fill = cut), show.legend = FALSE, width = 1 ) + theme(aspect.ratio = 1) + labs(x = NULL, y = NULL) bar + coord_flip() bar + coord_polar() bar + theme_classic() bar + theme_dark()
8.完整繪圖模板
ggplot(data = <DATA>) +
<GEOM_FUNCTION>(
mapping = aes(<MAPPINGS>),
stat = <STAT>,
position = <POSITION>
) +
<COORDINATE_FUNCTION> +
<FACET_FUNCTION>
4.ggpubr
ggscatter(iris,x="Sepal.Length",y="Petal.Length",color="Species")
ggboxplot(iris, x = "Species",
y = "Sepal.Length",
color = "Species",
shape = "Species",
add = "jitter") #ggpubr去掉了映射和圖層的概念
p
my_comparisons <- list( c("setosa", "versicolor"), c("setosa", "virginica"),
c("versicolor", "virginica") )
p + stat_compare_means(comparisons = my_comparisons)+
stat_compare_means(label.y = 9)
5.圖片保存
-
ggplot2系列:
ggsave("iris_box_ggpubr.png") ggsave(p,filename = "iris_box_ggpubr2.png") -
通用:三段論
保存的格式及文件名 pdf(“test.pdf”) 作圖代碼 .......... ......... 畫完了,關閉畫板 dev.off() -
神奇eoffice
library(eoffice) topptx(p,"iris_box_ggpubr.pptx")
6.拼圖
- R包patchwork
- 語法簡單,完美兼容ggplot2
- 拼圖比例設置簡單
- (1)支持直接p1+p2拼圖,比任何一個包都簡單
- (2)復雜的布局代碼易讀性更強
- (3)可以給子圖添加標記(例如ABCD, I II III IV 這樣)
- (4)可以統(tǒng)一修改所有子圖
- (5)可以將子圖的圖例移到一起,整體性特別好
ibrary(patchwork)
p1 = ggscatter(iris,x="Sepal.Length",
y="Petal.Length",
color="Species")
p2 <- ggboxplot(iris, x = "Species",
y = "Sepal.Length",
color = "Species",
shape = "Species",
add = "jitter")
p3 = ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
geom_boxplot()
p4 = ggplot(data = diamonds) +
geom_bar(
mapping = aes(x = cut, fill = cut),
show.legend = FALSE,
width = 1
)
p1 + p2 + p3 + p4 + plot_annotation(tag_level = "A")
p1/p2
-
代碼可運行卻不出圖——因為畫板被占用
dev.off() #表示關閉畫板 #多次運行dev.off(),到null device為止,在運行出圖代碼或dev.new(y)
7.進階
1.tidyr 核心函數(shù)
-
tidyr的扁和長
### 原始數(shù)據(jù) test <- data.frame(geneid = paste0("gene",1:4), sample1 = c(1,4,7,10), sample2 = c(2,5,0.8,11), sample3 = c(0.3,6,9,12)) test ### 扁變長 test_gather <- gather(data = test, key = sample_nm, value = exp, - geneid) head(test_gather) ### 長變扁 test_re <- spread(data = test_gather, key = sample_nm, value = exp) head(test_re) -
tidyr的分與合
### 原始數(shù)據(jù) test <- data.frame(x = c( "a,b", "a,d", "b,c"));test ### 分割 test_seprate <- separate(test,x, c("X", "Y"),sep = ",");test_seprate ### 合并 test_re <- unite(test_seprate,"x",X,Y,sep = ",") -
處理NA
### 原始數(shù)據(jù) X<-data.frame(X1 = LETTERS[1:5],X2 = 1:5) X[2,2] <- NA X[4,1] <- NA ### 1.去掉含有NA的行,可以選擇只根據(jù)某一列來去除 drop_na(X) drop_na(X,X1) drop_na(X,X2) ### 2.替換NA replace_na(X$X2,0) ### 3.用上一行的值填充NA X fill(X,X2)
2.dplyr
1.mutate(),新增列
test <- iris[c(1:2,51:52,101:102),]
rownames(test) =NULL
mutate(test, new = Sepal.Length * Sepal.Width)
2.select(),按列篩選
####(1)按列號篩選
select(test,1)
select(test,c(1,5))
####(2)按列名篩選
select(test,Sepal.Length)
select(test, Petal.Length, Petal.Width)
vars <- c("Petal.Length", "Petal.Width")
select(test, one_of(vars))
#####一組來自tidyselect的有用函數(shù)
select(test, starts_with("Petal"))
select(test, ends_with("Width"))
select(test, contains("etal"))
select(test, matches(".t."))
select(test, everything())
select(test, last_col())
select(test, last_col(offset = 1))
####(4)利用everything(),列名可以重排序
select(test,Species,everything())
3.filter()篩選行
filter(test, Species == "setosa")
filter(test, Species == "setosa"&Sepal.Length > 5 )
filter(test, Species %in% c("setosa","versicolor"))
4.arrange(),按某一列對整個表格進行排序
arrange(test, Sepal.Length)#默認從小到大排序
arrange(test, desc(Sepal.Length))#用desc從大到小
arrange(test, desc(Sepal.Width),Sepal.Length)
5.summarise():匯總
#對數(shù)據(jù)進行匯總操作,結合group_by使用實用性強
summarise(test, mean(Sepal.Length), sd(Sepal.Length))# 計算Sepal.Length的平均值和標準差:
# 先按照Species分組,計算每組Sepal.Length的平均值和標準差
group_by(test, Species)
tmp = summarise(group_by(test, Species),mean(Sepal.Length), sd(Sepal.Length))
6.兩個實用技能
- 1:管道操作 %>% (cmd/ctr + shift + M)
library(dplyr)
x1 = filter(iris,Sepal.Width>3)
x2 = select(x1,c("Sepal.Length","Sepal.Width" ))
x3 = arrange(x2,Sepal.Length)
colnames(iris)
iris %>%
filter(Sepal.Width>3) %>%
select(c("Sepal.Length","Sepal.Width" ))%>%
arrange(Sepal.Length)
- 2:count統(tǒng)計某列的unique值
count(test,Species)
##處理關系數(shù)據(jù):即將2個表進行連接,注意:不要引入factor
options(stringsAsFactors = F)
test1 <- data.frame(name = c('jimmy','nicker','doodle'),
blood_type = c("A","B","O"))
test1
test2 <- data.frame(name = c('doodle','jimmy','nicker','tony'),
group = c("group1","group1","group2","group2"),
vision = c(4.2,4.3,4.9,4.5))
test2
test3 <- data.frame(NAME = c('doodle','jimmy','lucy','nicker'),
weight = c(140,145,110,138))
merge(test1,test2,by="name")
merge(test1,test3,by.x = "name",by.y = "NAME")
###1.內(nèi)連inner_join,取交集
inner_join(test1, test2, by = "name")
inner_join(test1,test3,by = c("name"="NAME"))
###2.左連left_join
left_join(test1, test2, by = 'name')
left_join(test2, test1, by = 'name')
###3.全連full_join
full_join(test1, test2, by = 'name')
###4.半連接:返回能夠與y表匹配的x表所有記錄semi_join
semi_join(x = test1, y = test2, by = 'name')
###5.反連接:返回無法與y表匹配的x表的所記錄anti_join
anti_join(x = test2, y = test1, by = 'name')
3.stringr
1.檢測字符串長度
library(stringr)
x <- "The birch canoe slid on the smooth planks."
x
length(x)
str_length(x)
2.字符串拆分與組合
str_split(x," ")
x2 = str_split(x," ")[[1]]
str_c(x2,collapse = " ")
str_c(x2,1234,sep = "+")
3.提取字符串的一部分
str_sub(x,5,9)
4.大小寫轉(zhuǎn)換
str_to_upper(x2)
str_to_lower(x2)
str_to_title(x2)
5.字符串排序
str_sort(x2)
6.字符檢測
str_detect(x2,"h")
str_starts(x2,"T")
str_ends(x2,"e")
###與sum和mean連用,可以統(tǒng)計匹配的個數(shù)和比例
sum(str_detect(x2,"h"))
mean(str_detect(x2,"h"))
7.提取匹配到的字符串
str_subset(x2,"h")
8.字符計數(shù)
str_count(x," ")
str_count(x2,"o")
9.字符串替換
str_replace(x2,"o","A")
str_replace_all(x2,"o","A")
8.條件語句和循環(huán)語句
一.條件語句
###1.if(){ }
#### (1)只有if沒有else,那么條件是FALSE時就什么都不做
i = -1
if (i<0) print('up')
if (i>0) print('up')
#理解下面代碼
if(!require(tidyr)) install.packages('tidyr')
#### (2)有else
i =1
if (i>0){
cat('+')
} else {
print("-")
}
ifelse(i>0,"+","-")
x=rnorm(10)
y=ifelse(x>0,"+","-")
y
#### (3)多個條件
i = 0
if (i>0){
print('+')
} else if (i==0) {
print('0')
} else if (i< 0){
print('-')
}
ifelse(i>0,"+",ifelse((i<0),"-","0"))
### 2.switch()
cd = 3
foo <- switch(EXPR = cd,
#EXPR = "aa",
aa=c(3.4,1),
bb=matrix(1:4,2,2),
cc=matrix(c(T,T,F,T,F,F),3,2),
dd="string here",
ee=matrix(c("red","green","blue","yellow")))
foo
-
ifelse函數(shù)
- 三個參數(shù)
- ifelse(x,yes,no)
- x:邏輯值
- yes:邏輯值為TRUE時的返回值
- no:邏輯值為FALSE時的返回值
二、循環(huán)語句
### 1.for循環(huán)
#**順便看一下next和break**
x <- c(5,6,0,3)
s=0
for (i in x){
s=s+i
#if(i == 0) next
#if (i == 0) break
print(c(which(x==i),i,1/i,s))
}
x <- c(5,6,0,3)
s = 0
for (i in 1:length(x)){
s=s+x[[i]]
#if(i == 3) next
#if (i == 3) break
print(c(i,x[[i]],1/i,s))
}
#如何將結果存下來?
s = 0
result = list()
for(i in 1:length(x)){
s=s+x[[i]]
result[[i]] = c(i,x[[i]],1/i,s)
}
do.call(cbind,result)
### 2.while 循環(huán)
i = 0
while (i < 5){
print(c(i,i^2))
i = i+1
}
### 3.repeat 語句
#注意:必須有break
i=0L
s=0L
repeat{
i = i + 1
s = s + i
print(c(i,s))
if(i==50) break
}
三、長腳本管理方式
- 1.分成多個腳本,每個腳本最后保存Rdata,下一個腳本開頭清空再加載。

image-20201126170014008.png
- if(F){…} ,則{ }里的腳本被跳過 if(T){…} ,則{ }里的腳本被運行 凡是帶有{ }的代碼,均可以被折疊

image-20201126170110219.png
四、apply函數(shù)
apply(X, MARGIN, FUN, …)
apply(test, 2, mean)
apply(test, 1, sum)
#其中X是數(shù)據(jù)框/矩陣名;MARGIN為1表示取行,為2表示取列,F(xiàn)UN是函數(shù)
#對X的每一行/列進行FUN這個函數(shù)
五、R語言遍歷、創(chuàng)建、刪除文件夾
dir()
file.create() file.exists(...)
file.remove(...)
file.rename(from, to)
file.append(file1, file2)
file.copy(from, to, overwrite = recursive, recursive = FALSE,
copy.mode = TRUE, copy.date = FALSE)
file.symlink(from, to)
file.link(from, to)
dir.create("doudou")
unlink("doudou",recursive = T)