安裝、加載R包和dplyr包的使用
一、鏡像設(shè)置
1.初級(jí)模式設(shè)置

26980503-d2c93307b1b95e10.png
2. options函數(shù)就是設(shè)置R運(yùn)行過(guò)程中的一些選項(xiàng)設(shè)置
##檢驗(yàn)鏡像環(huán)境
options()$repos
# CRAN和Bioconductor鏡像設(shè)置
options("repos" = c(CRAN="https://mirrors.tuna.tsinghua.edu.cn/CRAN/")) #對(duì)應(yīng)清華源
options(BioC_mirror="https://mirrors.ustc.edu.cn/bioc/") #對(duì)應(yīng)中科大源
##檢驗(yàn)Bioconductor鏡像
options()$BioC_mirror
##R的配置文件 .Rprofile
file.edit('~/.Rprofile')
options("repos" = c(CRAN="https://mirrors.tuna.tsinghua.edu.cn/CRAN/")) #對(duì)應(yīng)清華源
options(BioC_mirror="https://mirrors.ustc.edu.cn/bioc/") #對(duì)應(yīng)中科大源
options()$repos
options()$BioC_mirror
二、包的安裝與加載
1. R包安裝
CRAN網(wǎng)站 install.packages(“包”)
Biocductor BiocManager::install(“包”)
2. R包加載
library(包)
require(包)
3. 安裝加載三部曲
options("repos" = c(CRAN="https://mirrors.tuna.tsinghua.edu.cn/CRAN/"))
options(BioC_mirror="https://mirrors.ustc.edu.cn/bioc/")
install.packages("dplyr")
library(dplyr)
三、dplyr包的應(yīng)用
1. mutate(),新增列
> test <- iris[c(1:2,51:52,101:102),]
> test
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
51 7.0 3.2 4.7 1.4 versicolor
52 6.4 3.2 4.5 1.5 versicolor
101 6.3 3.3 6.0 2.5 virginica
102 5.8 2.7 5.1 1.9 virginica
> mutate(test, new = Sepal.Length * Sepal.Width)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species new
1 5.1 3.5 1.4 0.2 setosa 17.85
2 4.9 3.0 1.4 0.2 setosa 14.70
51 7.0 3.2 4.7 1.4 versicolor 22.40
52 6.4 3.2 4.5 1.5 versicolor 20.48
101 6.3 3.3 6.0 2.5 virginica 20.79
102 5.8 2.7 5.1 1.9 virginica 15.666
2. select(),按列篩選
(1) 按列號(hào)篩選
> test <- iris[c(1:2,51:52,101:102),]
> select(test,1)
Sepal.Length
1 5.1
2 4.9
51 7.0
52 6.4
101 6.3
102 5.8
> select(test,c(1,5))
Sepal.Length Species
1 5.1 setosa
2 4.9 setosa
51 7.0 versicolor
52 6.4 versicolor
101 6.3 virginica
102 5.8 virginica
> select(test,Sepal.Length)
Sepal.Length
1 5.1
2 4.9
51 7.0
52 6.4
101 6.3
102 5.8
(2) 按列名篩選
> test <- iris[c(1:2,51:52,101:102),]
> test
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
51 7.0 3.2 4.7 1.4 versicolor
52 6.4 3.2 4.5 1.5 versicolor
101 6.3 3.3 6.0 2.5 virginica
102 5.8 2.7 5.1 1.9 virginica
> select(test, Petal.Length, Petal.Width)
Petal.Length Petal.Width
1 1.4 0.2
2 1.4 0.2
51 4.7 1.4
52 4.5 1.5
101 6.0 2.5
102 5.1 1.9
> vars <- c("Petal.Length", "Petal.Width")
> select(test, one_of(vars))
Petal.Length Petal.Width
1 1.4 0.2
2 1.4 0.2
51 4.7 1.4
52 4.5 1.5
101 6.0 2.5
102 5.1 1.9
3. filter()篩選行
> test <- iris[c(1:2,51:52,101:102),]
> test
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
51 7.0 3.2 4.7 1.4 versicolor
52 6.4 3.2 4.5 1.5 versicolor
101 6.3 3.3 6.0 2.5 virginica
102 5.8 2.7 5.1 1.9 virginica
> filter(test, Species == "setosa")
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
> filter(test, Species == "setosa"&Sepal.Length > 5 )
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
> filter(test, Species %in% c("setosa","versicolor"))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 7.0 3.2 4.7 1.4 versicolor
4 6.4 3.2 4.5 1.5 versicolor
4. arrange(),按某1列或某幾列對(duì)整個(gè)表格進(jìn)行排序
-
?desc將一個(gè)向量轉(zhuǎn)換為將按降序排序的格式。這在arrange()中很有用。
> test <- iris[c(1:2,51:52,101:102),]
> test
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
51 7.0 3.2 4.7 1.4 versicolor
52 6.4 3.2 4.5 1.5 versicolor
101 6.3 3.3 6.0 2.5 virginica
102 5.8 2.7 5.1 1.9 virginica
> arrange(test, Sepal.Length) #默認(rèn)從小到大排序
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 4.9 3.0 1.4 0.2 setosa
2 5.1 3.5 1.4 0.2 setosa
3 5.8 2.7 5.1 1.9 virginica
4 6.3 3.3 6.0 2.5 virginica
5 6.4 3.2 4.5 1.5 versicolor
6 7.0 3.2 4.7 1.4 versicolor
> arrange(test, desc(Sepal.Length)) #用desc從大到小
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 7.0 3.2 4.7 1.4 versicolor
2 6.4 3.2 4.5 1.5 versicolor
3 6.3 3.3 6.0 2.5 virginica
4 5.8 2.7 5.1 1.9 virginica
5 5.1 3.5 1.4 0.2 setosa
6 4.9 3.0 1.4 0.2 setosa
5. summarise():匯總
(1) 直接匯總
> test <- iris[c(1:2,51:52,101:102),]
> test
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
51 7.0 3.2 4.7 1.4 versicolor
52 6.4 3.2 4.5 1.5 versicolor
101 6.3 3.3 6.0 2.5 virginica
102 5.8 2.7 5.1 1.9 virginica
> # 計(jì)算Sepal.Length的平均值和標(biāo)準(zhǔn)差
> summarise(test, mean(Sepal.Length), sd(Sepal.Length))
mean(Sepal.Length) sd(Sepal.Length)
1 5.916667 0.8084965
(2)對(duì)數(shù)據(jù)進(jìn)行“ group_by ”匯總操作
> test <- iris[c(1:2,51:52,101:102),]
> test
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
51 7.0 3.2 4.7 1.4 versicolor
52 6.4 3.2 4.5 1.5 versicolor
101 6.3 3.3 6.0 2.5 virginica
102 5.8 2.7 5.1 1.9 virginica
> # 先按照Species分組,計(jì)算每組Sepal.Length的平均值和標(biāo)準(zhǔn)差
> group_by(test, Species)
# A tibble: 6 x 5
# Groups: Species [3]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 7 3.2 4.7 1.4 versicolor
4 6.4 3.2 4.5 1.5 versicolor
5 6.3 3.3 6 2.5 virginica
6 5.8 2.7 5.1 1.9 virginica
> summarise(group_by(test, Species),mean(Sepal.Length), sd(Sepal.Length))
# A tibble: 3 x 3
Species `mean(Sepal.Length)` `sd(Sepal.Length)`
<fct> <dbl> <dbl>
1 setosa 5 0.141
2 versicolor 6.7 0.424
3 virginica 6.05 0.35
四、dplyr兩個(gè)實(shí)用技能
1. 管道操作 %>% (cmd/ctr + shift + M)(加載任意一個(gè)tidyverse包即可用管道符號(hào))
> ?tidyverse
No documentation for ‘tidyverse’ in specified packages and libraries:
you could try ‘??tidyverse’
> ??tidyverse
> test <- iris[c(1:2,51:52,101:102),]
> test
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
51 7.0 3.2 4.7 1.4 versicolor
52 6.4 3.2 4.5 1.5 versicolor
101 6.3 3.3 6.0 2.5 virginica
102 5.8 2.7 5.1 1.9 virginica
> test %>%
+ group_by(Species) %>%
+ summarise(mean(Sepal.Length), sd(Sepal.Length))
# A tibble: 3 x 3
Species `mean(Sepal.Length)` `sd(Sepal.Length)`
<fct> <dbl> <dbl>
1 setosa 5 0.141
2 versicolor 6.7 0.424
3 virginica 6.05 0.354
2. count統(tǒng)計(jì)某列的unique值
> test <- iris[c(1:2,51:52,101:102),]
> test
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
51 7.0 3.2 4.7 1.4 versicolor
52 6.4 3.2 4.5 1.5 versicolor
101 6.3 3.3 6.0 2.5 virginica
102 5.8 2.7 5.1 1.9 virginica
> count(test,Species)
Species n
1 setosa 2
2 versicolor 2
3 virginica 2
五、dplyr處理關(guān)系數(shù)據(jù)
- 即將2個(gè)表進(jìn)行連接,注意:不要引入factor
stringsAsFactors:
default.stringsAsFactors的默認(rèn)設(shè)置,在R < 4.1.0中,它被用來(lái)提供data.fram和read.table的stringsAsFactors參數(shù)的默認(rèn)值。
> options(stringsAsFactors = F)
> ?options
> ?data.frame
> test1 <- data.frame(x = c('b','e','f','x'),
+ z = c("A","B","C",'D'),
+ stringsAsFactors = F)
> test1
x z
1 b A
2 e B
3 f C
4 x D
> test2 <- data.frame(x = c('a','b','c','d','e','f'),
+ y = c(1,2,3,4,5,6),
+ stringsAsFactors = F)
> test2
x y
1 a 1
2 b 2
3 c 3
4 d 4
5 e 5
6 f 6
1.內(nèi)連inner_join,取交集
> inner_join(test1, test2, by = "x")
x z y
1 b A 2
2 e B 5
3 f C 6
2.左連left_join
> left_join(test1, test2, by = 'x')
x z y
1 b A 2
2 e B 5
3 f C 6
4 x D NA
> left_join(test2, test1, by = 'x')
x y z
1 a 1 <NA>
2 b 2 A
3 c 3 <NA>
4 d 4 <NA>
5 e 5 B
6 f 6 C
3.全連full_join
> full_join( test1, test2, by = 'x')
x z y
1 b A 2
2 e B 5
3 f C 6
4 x D NA
5 a <NA> 1
6 c <NA> 3
7 d <NA> 4
4.半連接:返回能夠與y表匹配的x表所有記錄semi_join
> test1
x z
1 b A
2 e B
3 f C
4 x D
> test2
x y
1 a 1
2 b 2
3 c 3
4 d 4
5 e 5
6 f 6
> semi_join(x = test1, y = test2, by = 'x')
x z
1 b A
2 e B
3 f C
5.反連接:返回?zé)o法與y表匹配的x表的所有記錄anti_join
> test1
x z
1 b A
2 e B
3 f C
4 x D
> test2
x y
1 a 1
2 b 2
3 c 3
4 d 4
5 e 5
6 f 6
> anti_join(x = test2, y = test1, by = 'x')
x y
1 a 1
2 c 3
3 d 4
6.簡(jiǎn)單合并
在相當(dāng)于base包里的cbind()函數(shù)和rbind()函數(shù);
注意,bind_rows()函數(shù)需要兩個(gè)表格列數(shù)相同,
而bind_cols()函數(shù)則需要兩個(gè)數(shù)據(jù)框有相同的行數(shù)
> test1 <- data.frame(x = c(1,2,3,4), y = c(10,20,30,40))
> test1
x y
1 1 10
2 2 20
3 3 30
4 4 40
> test2 <- data.frame(x = c(5,6), y = c(50,60))
> test2
x y
1 5 50
2 6 60
> test3 <- data.frame(z = c(100,200,300,400))
> test3
z
1 100
2 200
3 300
4 400
> bind_rows(test1, test2)
x y
1 1 10
2 2 20
3 3 30
4 4 40
5 5 50
6 6 60
> bind_cols(test1, test3)
x y z
1 1 10 100
2 2 20 200
3 3 30 300
4 4 40 400