因?yàn)槲易罱墓ぷ骱芏喽寂c字符串處理相關(guān)。恰好生信星球小潔學(xué)姐在之前介紹了stringr這個包的用法。我今天的主要工作就是把小潔學(xué)姐的代碼總結(jié)一遍實(shí)際操作一遍。之后將他用到我的工作中。
1. 字符串基本操作
1.1字符串長度統(tǒng)計(jì)
library(stringr)
str_length(c("a","R for bioplanet",NA))
1.2字符串的連接
str_c("x","y")
str_c("x","y","z")
str_c("x","y",sep=",")
str_c("prefix-",c("a","b"),"-suffix")
str_replace_na() #去掉NA
str_c(c("x","y","z"),collapse="") #在這里collapse是將字符向量合并為字符串,注意collapse的用法
1.3str_sub 字符串的提取
x=c("apple","banana","pear")
str_sub(x,1,3)
str_sub(x,-3,-1)
1.4大小寫轉(zhuǎn)化及字母排序
str_to_lower("X")
str_to_upper(“x”)
str_to_title("dedd")
str_sort(x,locale="en")
2 正則表達(dá)式的用法
主要用到的語句是str_view(),和str_view_all()
2.1基礎(chǔ)匹配
x=c("apple","banana","pear")
str_view(x,"an")
y=c("app.e","banana","pear")
str_view(y,"\\.")#這塊轉(zhuǎn)義為什么要用兩個反斜杠有點(diǎn)沒太理解
str_view(x,"^a")
str_view(x,"a$")
str_view(x,"^a$")
3. 匹配檢測
x=c("apple","banana","pear")
str_detect(x,"e")#與sum和mean連用,統(tǒng)計(jì)匹配的個數(shù)和比例
sum(str_detect(x,"^a"))
mean(str_detect(x,"[aeiou]$"))
x[!str_detect(x,"[aeiou]$")]
str_subset(x,"[aeiou]$")
str_count(x,"a")
3.2 匹配內(nèi)容的提取
有一個示例代碼
length(sentences)
head(sentences)
colors=c("red","orange","yellow","green","blue","purple")
color_match=str_c(colors,collapse = "|")
has_color=str_subset(sentences,color_match)
more=sentences[str_count(sentences,color_match)>1]
str_view_all(more,color_match)
3.3 替換匹配的內(nèi)容
str_replace(x,"[aeiou]","-")
str_replace_all(x,c("1"="one","2"="two","3"="three"))
3.4 拆分
sentences %>% head(5) %>% str_split(" ")
c("name:hadley","country:nz:2","age:35") %>% str_split(":",simplify = TRUE,n=2)
邊界的探索

QQ圖片20181126232810.png