黄亚洲精品女AV,av有码在线

數(shù)據(jù)清洗是數(shù)據(jù)分析中最為繁雜頭疼的部分。

字符串清洗

R自帶函數(shù)

grep，grepl和regexpr是R自帶的三個(gè)字符串匹配函數(shù)。

grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,
     fixed = FALSE, useBytes = FALSE, invert = FALSE)
grepl(pattern, x, ignore.case = FALSE, perl = FALSE,
      fixed = FALSE, useBytes = FALSE)
sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
    fixed = FALSE, useBytes = FALSE)
regexpr(pattern, text, ignore.case = FALSE, perl = FALSE,
        fixed = FALSE, useBytes = FALSE)

grep返回符合pattern匹配的元素的下標(biāo)，默認(rèn)是integer；
grepl返回符合pattern匹配的邏輯值，class為logical；
sub返回和輸入長(zhǎng)度一致的string，但將符合匹配的pattern替換為replacement。
regexpr返回和輸入長(zhǎng)度一致的integer vector，指出每個(gè)元素中匹配pattern字符的起始位置，如無(wú)匹配則返回-1

stringr包

stringr提供了一系列的wrapper，能夠更好地操作字符串。

modifier functions

需要指出的是，stringr中的pattern默認(rèn)是正則表達(dá)式(即regex)。如果要進(jìn)行修改的話，stringr給出了4種modifier functions。ignore_case為是否忽略大小寫的開(kāi)關(guān)。

fixed：Compare literal bytes in the string. This is very fast, but not usually what you want for non-ASCII character sets.

fixed(pattern, ignore_case = FALSE)

coll：Compare strings respecting standard collation rules.

coll(pattern, ignore_case = FALSE, locale = "en", ...)

regex：The default. Uses ICU regular expressions.

regex(pattern, ignore_case = FALSE, multiline = FALSE, comments = FALSE,
  dotall = FALSE, ...)

boundary：Match boundaries between things.

boundary(type = c("character", "line_break", "sentence", "word"),
  skip_word_none = NA, ...)

str_detect （grepl）

str_detect()相當(dāng)于grepl，返回邏輯vector。pattern可以是一個(gè)vector

str_detect(string, pattern)
> fruit <- c("apple", "banana", "pear", "pinapple")
> str_detect(fruit, "^a")
[1]  TRUE FALSE FALSE FALSE
> str_detect("aecfg", letters[1:6])
[1]  TRUE FALSE  TRUE FALSE  TRUE  TRUE

str_split（strsplit）

str_split相當(dāng)于R自帶的strsplit。接受string輸入，返回分拆后的list。如果確認(rèn)返回后長(zhǎng)度一致，可以改為str_split_fixed，這樣會(huì)返回一個(gè)matrix。

str_split(string, pattern, n = Inf, simplify = FALSE)
str_split_fixed(string, pattern, n) #n為返回結(jié)果的長(zhǎng)度

str_count

str_count輸出pattern的計(jì)數(shù)，也就是一個(gè)interger vector。pattern默認(rèn)為空字符串。

str_count(string, pattern = "")##
> str_count(fruit)
[1] 5 6 4 8
> str_count(fruit, c("a", "b", "p", "p"))
[1] 1 1 1 3 #注意vector運(yùn)算的法則！

str_replace(sub)

str_replace相當(dāng)于R自帶的sub，它只替換每個(gè)string中每個(gè)元素內(nèi)部第一個(gè)匹配。而str_replace_all則替換全部匹配。

str_replace(string, pattern, replacement)
str_replace_all(string, pattern, replacement)
> str_replace(fruit, "[aeiou]", "-")
[1] "-pple"    "b-nana"   "p-ar"     "p-napple"
> str_replace_all(fruit, "[aeiou]", "-")
[1] "-ppl-"    "b-n-n-"   "p--r"     "p-n-ppl-"

str_replace_na函數(shù)是一個(gè)特殊的wrapper，能將NA轉(zhuǎn)換為字符串‘NA’

str_replace_na(string, replacement = "NA")
> str_replace_na(c(NA, "abc", "def"))
[1] "NA"  "abc" "def"

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

《Learning R》筆記 Chapter 13 Cleaning data 上字符串清洗

《Learning R》筆記 Chapter 13 Cleaning data 上字符串清洗

字符串清洗

R自帶函數(shù)

stringr包

modifier functions

str_detect （grepl）

str_split（strsplit）

str_count

str_replace(sub)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

《Learning R》筆記 Chapter 13 Cleaning data 上 字符串清洗

字符串清洗

R自帶函數(shù)

stringr包

modifier functions

str_detect （grepl）

str_split（strsplit）

str_count

str_replace(sub)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

《Learning R》筆記 Chapter 13 Cleaning data 上字符串清洗