心專才能繡得花,心靜才能織得麻。
1. 玩轉(zhuǎn)字符串
- stringr包 (1)str_length
library(stringr)
x <- "The birch canoe slid on the smooth planks."
x
## [1] "The birch canoe slid on the smooth planks."
str_length(x) #字符串中字符的個數(shù)(包括單個字母/數(shù)字/符號)
## [1] 42
length(x) #元素/字符串的個數(shù)
## [1] 1
(2)str_split 按照分隔符拆分字符串
str_split(x," ") #分隔符是空格,返回的結(jié)果是列表
## [[1]]
## [1] "The" "birch" "canoe" "slid" "on" "the" "smooth"
## [8] "planks."
x2 = str_split(x," ")[[1]];x2 #兩個中括號取子集,就是向量集合,一個中括號取子集取出來的還是列表
## [1] "The" "birch" "canoe" "slid" "on" "the" "smooth"
## [8] "planks."
y = c("jimmy 150","nicker 140","tony 152")
str_split(y," ") #列表
## [[1]]
## [1] "jimmy" "150"
##
## [[2]]
## [1] "nicker" "140"
##
## [[3]]
## [1] "tony" "152"
str_split(y," ",simplify = T) #矩陣,[1,]行列表明一定是個矩陣。
## [,1] [,2]
## [1,] "jimmy" "150"
## [2,] "nicker" "140"
## [3,] "tony" "152"
復習:矩陣只允許一種數(shù)據(jù)類型,數(shù)據(jù)框每列只允許一種數(shù)據(jù)類型。數(shù)據(jù)框列名V1/2/3.
(3)str_sub 截短,按位置提取字符串
str_sub(x,5,9) #字符串的截短
## [1] "birch"
(4)str_detect
str_detect(x2,"h") #字符檢測
## [1] TRUE TRUE FALSE FALSE FALSE TRUE TRUE FALSE
str_starts(x2,"T") #是否以T開始
## [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
str_ends(x2,"e") #是否以e結(jié)尾
## [1] TRUE FALSE TRUE FALSE FALSE TRUE FALSE FALSE
(5)字符替換
x2
## [1] "The" "birch" "canoe" "slid" "on" "the" "smooth"
## [8] "planks."
str_replace(x2,"o","A") #只替換第一個字母
## [1] "The" "birch" "canAe" "slid" "An" "the" "smAoth"
## [8] "planks."
str_replace_all(x2,"o","A") #替換全部字母
## [1] "The" "birch" "canAe" "slid" "An" "the" "smAAth"
## [8] "planks."
(6)字符刪除
x
## [1] "The birch canoe slid on the smooth planks."
str_remove(x," ") #只刪除第一個
## [1] "Thebirch canoe slid on the smooth planks."
str_remove_all(x," ") #刪除全部
## [1] "Thebirchcanoeslidonthesmoothplanks."
2. 玩轉(zhuǎn)數(shù)據(jù)框
- arrange排序 arrange是dplyr包中的函數(shù),數(shù)據(jù)框按照某一列排序
test <- iris[c(1:2,51:52,101:102),]
rownames(test) =NULL # 去掉行名,NULL是“什么都沒有”
test
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 7.0 3.2 4.7 1.4 versicolor
## 4 6.4 3.2 4.5 1.5 versicolor
## 5 6.3 3.3 6.0 2.5 virginica
## 6 5.8 2.7 5.1 1.9 virginica
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
arrange(test, Sepal.Length) #從小到大
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 4.9 3.0 1.4 0.2 setosa
## 2 5.1 3.5 1.4 0.2 setosa
## 3 5.8 2.7 5.1 1.9 virginica
## 4 6.3 3.3 6.0 2.5 virginica
## 5 6.4 3.2 4.5 1.5 versicolor
## 6 7.0 3.2 4.7 1.4 versicolor
arrange(test, desc(Sepal.Length)) #從大到小
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 7.0 3.2 4.7 1.4 versicolor
## 2 6.4 3.2 4.5 1.5 versicolor
## 3 6.3 3.3 6.0 2.5 virginica
## 4 5.8 2.7 5.1 1.9 virginica
## 5 5.1 3.5 1.4 0.2 setosa
## 6 4.9 3.0 1.4 0.2 setosa
arrange(test, "Sepal.Length") #沒報錯,也沒排序
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 7.0 3.2 4.7 1.4 versicolor
## 4 6.4 3.2 4.5 1.5 versicolor
## 5 6.3 3.3 6.0 2.5 virginica
## 6 5.8 2.7 5.1 1.9 virginica
- distinct 數(shù)據(jù)框按照某一列去重復
distinct(test,Species,.keep_all = T) #.keep_all表示其他列的內(nèi)容也需要留下來,如果沒有這句代碼,只輸出Species的篩選結(jié)果。
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 7.0 3.2 4.7 1.4 versicolor
## 3 6.3 3.3 6.0 2.5 virginica
- mutate 數(shù)據(jù)框新增列
mutate(test, new = Sepal.Length * Sepal.Width)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species new
## 1 5.1 3.5 1.4 0.2 setosa 17.85
## 2 4.9 3.0 1.4 0.2 setosa 14.70
## 3 7.0 3.2 4.7 1.4 versicolor 22.40
## 4 6.4 3.2 4.5 1.5 versicolor 20.48
## 5 6.3 3.3 6.0 2.5 virginica 20.79
## 6 5.8 2.7 5.1 1.9 virginica 15.66
ncol(test) #沒賦值就不會變?。?!
## [1] 5
#[1] 5
test$new = test$Sepal.Length*test$Sepal.Width #這種就對test賦值了
test
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species new
## 1 5.1 3.5 1.4 0.2 setosa 17.85
## 2 4.9 3.0 1.4 0.2 setosa 14.70
## 3 7.0 3.2 4.7 1.4 versicolor 22.40
## 4 6.4 3.2 4.5 1.5 versicolor 20.48
## 5 6.3 3.3 6.0 2.5 virginica 20.79
## 6 5.8 2.7 5.1 1.9 virginica 15.66
ncol(test)
## [1] 6
3. 連續(xù)的步驟
select篩選列,filter篩選行,用中括號逗號的左右就行 1. 多次賦值,產(chǎn)生多個中間的變量
x1 = select(iris,-5) #參數(shù)直接寫進括號里
class(x1)
## [1] "data.frame"
x2 = as.matrix(x1)
x3 = head(x2,50) #head表示截取前50行
heatmap(x3)
2. 參數(shù)由管道符號傳遞進括號里,如果不加說明,默認把前面的數(shù)據(jù)傳遞到后面函數(shù)的第一個位置上
iris %>%
select(-5) %>%
as.matrix() %>%
head(50) %>%
pheatmap::pheatmap()
指導老師:生信技能樹 小潔老師??