Day6-2:如何編輯字符串,數(shù)據(jù)框

心專才能繡得花,心靜才能織得麻。

1. 玩轉(zhuǎn)字符串
  1. stringr包 (1)str_length
library(stringr)
x <- "The birch canoe slid on the smooth planks."
x
## [1] "The birch canoe slid on the smooth planks."
str_length(x) #字符串中字符的個數(shù)(包括單個字母/數(shù)字/符號)
## [1] 42
 
length(x) #元素/字符串的個數(shù)
 
## [1] 1

(2)str_split 按照分隔符拆分字符串

str_split(x," ") #分隔符是空格,返回的結(jié)果是列表
 
## [[1]]
## [1] "The"     "birch"   "canoe"   "slid"    "on"      "the"     "smooth" 
## [8] "planks."
 
x2 = str_split(x," ")[[1]];x2 #兩個中括號取子集,就是向量集合,一個中括號取子集取出來的還是列表
 
## [1] "The"     "birch"   "canoe"   "slid"    "on"      "the"     "smooth" 
## [8] "planks."
 
y = c("jimmy 150","nicker 140","tony 152")
str_split(y," ") #列表
 
## [[1]]
## [1] "jimmy" "150"  
## 
## [[2]]
## [1] "nicker" "140"   
## 
## [[3]]
## [1] "tony" "152"
 
str_split(y," ",simplify = T) #矩陣,[1,]行列表明一定是個矩陣。
 
##      [,1]     [,2] 
## [1,] "jimmy"  "150"
## [2,] "nicker" "140"
## [3,] "tony"   "152"

復習:矩陣只允許一種數(shù)據(jù)類型,數(shù)據(jù)框每列只允許一種數(shù)據(jù)類型。數(shù)據(jù)框列名V1/2/3.

(3)str_sub 截短,按位置提取字符串

str_sub(x,5,9) #字符串的截短
 
## [1] "birch"

(4)str_detect

str_detect(x2,"h") #字符檢測
 
## [1]  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE
 
str_starts(x2,"T") #是否以T開始
 
## [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 
str_ends(x2,"e") #是否以e結(jié)尾
 
## [1]  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE

(5)字符替換

x2
 
## [1] "The"     "birch"   "canoe"   "slid"    "on"      "the"     "smooth" 
## [8] "planks."
 
str_replace(x2,"o","A") #只替換第一個字母
 
## [1] "The"     "birch"   "canAe"   "slid"    "An"      "the"     "smAoth" 
## [8] "planks."
 
str_replace_all(x2,"o","A") #替換全部字母
 
## [1] "The"     "birch"   "canAe"   "slid"    "An"      "the"     "smAAth" 
## [8] "planks."

(6)字符刪除

x
 
## [1] "The birch canoe slid on the smooth planks."
 
str_remove(x," ") #只刪除第一個
 
## [1] "Thebirch canoe slid on the smooth planks."
 
str_remove_all(x," ") #刪除全部
 
## [1] "Thebirchcanoeslidonthesmoothplanks."
2. 玩轉(zhuǎn)數(shù)據(jù)框
  1. arrange排序 arrange是dplyr包中的函數(shù),數(shù)據(jù)框按照某一列排序
test <- iris[c(1:2,51:52,101:102),]
rownames(test) =NULL # 去掉行名,NULL是“什么都沒有”
test
 
##   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1          5.1         3.5          1.4         0.2     setosa
## 2          4.9         3.0          1.4         0.2     setosa
## 3          7.0         3.2          4.7         1.4 versicolor
## 4          6.4         3.2          4.5         1.5 versicolor
## 5          6.3         3.3          6.0         2.5  virginica
## 6          5.8         2.7          5.1         1.9  virginica
 
library(dplyr)
 
## 
## Attaching package: 'dplyr'
 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
 
arrange(test, Sepal.Length) #從小到大
 
##   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1          4.9         3.0          1.4         0.2     setosa
## 2          5.1         3.5          1.4         0.2     setosa
## 3          5.8         2.7          5.1         1.9  virginica
## 4          6.3         3.3          6.0         2.5  virginica
## 5          6.4         3.2          4.5         1.5 versicolor
## 6          7.0         3.2          4.7         1.4 versicolor
 
arrange(test, desc(Sepal.Length)) #從大到小
 
##   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1          7.0         3.2          4.7         1.4 versicolor
## 2          6.4         3.2          4.5         1.5 versicolor
## 3          6.3         3.3          6.0         2.5  virginica
## 4          5.8         2.7          5.1         1.9  virginica
## 5          5.1         3.5          1.4         0.2     setosa
## 6          4.9         3.0          1.4         0.2     setosa
 
arrange(test, "Sepal.Length") #沒報錯,也沒排序
 
##   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1          5.1         3.5          1.4         0.2     setosa
## 2          4.9         3.0          1.4         0.2     setosa
## 3          7.0         3.2          4.7         1.4 versicolor
## 4          6.4         3.2          4.5         1.5 versicolor
## 5          6.3         3.3          6.0         2.5  virginica
## 6          5.8         2.7          5.1         1.9  virginica
  1. distinct 數(shù)據(jù)框按照某一列去重復
distinct(test,Species,.keep_all = T) #.keep_all表示其他列的內(nèi)容也需要留下來,如果沒有這句代碼,只輸出Species的篩選結(jié)果。
 
##   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1          5.1         3.5          1.4         0.2     setosa
## 2          7.0         3.2          4.7         1.4 versicolor
## 3          6.3         3.3          6.0         2.5  virginica
  1. mutate 數(shù)據(jù)框新增列
mutate(test, new = Sepal.Length * Sepal.Width)
 
##   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species   new
## 1          5.1         3.5          1.4         0.2     setosa 17.85
## 2          4.9         3.0          1.4         0.2     setosa 14.70
## 3          7.0         3.2          4.7         1.4 versicolor 22.40
## 4          6.4         3.2          4.5         1.5 versicolor 20.48
## 5          6.3         3.3          6.0         2.5  virginica 20.79
## 6          5.8         2.7          5.1         1.9  virginica 15.66
 
ncol(test) #沒賦值就不會變?。?!
 
## [1] 5
 
#[1] 5   
test$new = test$Sepal.Length*test$Sepal.Width #這種就對test賦值了
test
 
##   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species   new
## 1          5.1         3.5          1.4         0.2     setosa 17.85
## 2          4.9         3.0          1.4         0.2     setosa 14.70
## 3          7.0         3.2          4.7         1.4 versicolor 22.40
## 4          6.4         3.2          4.5         1.5 versicolor 20.48
## 5          6.3         3.3          6.0         2.5  virginica 20.79
## 6          5.8         2.7          5.1         1.9  virginica 15.66
 
ncol(test)
 
## [1] 6
3. 連續(xù)的步驟

select篩選列,filter篩選行,用中括號逗號的左右就行 1. 多次賦值,產(chǎn)生多個中間的變量

x1 = select(iris,-5) #參數(shù)直接寫進括號里
class(x1)
 
## [1] "data.frame"
 
x2 = as.matrix(x1)
x3 = head(x2,50) #head表示截取前50行 
heatmap(x3)

2. 參數(shù)由管道符號傳遞進括號里,如果不加說明,默認把前面的數(shù)據(jù)傳遞到后面函數(shù)的第一個位置上

iris %>%
  select(-5) %>% 
  as.matrix() %>%
  head(50) %>% 
  pheatmap::pheatmap()

指導老師:生信技能樹 小潔老師??

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容