黄色一区二区三区不卡,物业妻 AV一区

于 2020-05-29 那天，期盼已久的 dplyr 1.0.0 終于出來了（emm，鴿了半個月）。

dplyr 在出 1.0.0 版本之前不久，于 hadely 在 twitter 發(fā)文 dplyr 發(fā)布推遲半個月到 29 號，同時也終于把那黃不拉幾的 logo 換成了一個更炫目的 logo，新 logo 還是蠻好看的。

不過我還是喜歡粉筆畫版本的這個。好了，閑扯就這么多吧，我反正就記得了這個鴿了半個月。

關(guān)于 dplyr 1.0.0 的幾個我的筆記：

dplyr 1.0 須知
dplyr1.0.0 重點內(nèi)容
dplyr 1.0.0 之列操作
dplyr 1.0.0 之 rowwise
dplyr 1.0.0 之 select_rename_relocate

dplyr 1.0.0 出來了，我也該推一波相關(guān)資源了。

我想推薦的幾本圍繞《R for data science》相關(guān)的幾本書

Tidy evaluation（進化版）：https://tidyeval.tidyverse.org/
《Modern R with the tidyverse》：https://b-rodrigues.github.io/modern_R/
《Statistical Inference via Data Science: A ModernDive into R and the Tidyverse》：https://moderndive.netlify.com/index.html
《The tidyverse style guide（Tidyverse 代碼風(fēng)格指引）》: https://style.tidyverse.org/
《R 數(shù)據(jù)分析指南與速查手冊》：https://bookdown.org/xiao/RAnalysisBook/
《數(shù)據(jù)科學(xué)與 R 語言》：https://bookdown.org/xiangyun/RGraphics/
四川師范大學(xué)研究生公選課《數(shù)據(jù)科學(xué)中的 R 語言》：https://bookdown.org/wangminjie/R4DS/

我想推薦的幾篇 dplyr 博文：

Tidyverse 學(xué)習(xí)素材：https://www.stat.cmu.edu/~ryantibs/statcomp/lectures/
Tidyverse 問答社區(qū)：https://community.rstudio.com/c/tidyverse
Tidyverse 中包更新消息：https://www.tidyverse.org/blog/
data.table and dplyr（兩兩對比）：https://atrebas.github.io/post/2019-03-03-datatable-dplyr/
TidyTuesday（數(shù)據(jù)處理+可視化實例）：https://github.com/rfordatascience/tidytuesday/blob/master/README.md
TidyTuesday twitter 在線shiny app：https://nsgrantham.shinyapps.io/tidytuesdayrocks/
dplyr 操作 50 例（強烈推薦跟一波）：https://www.listendata.com/2016/08/dplyr-tutorial.html
Hot questions for Dplyr（強烈推薦）****：https://www.thetopsites.net/projects/dplyr/ dplyr 處理數(shù)據(jù)的各種問題收集。
知乎張敬信老師的 玩轉(zhuǎn)數(shù)據(jù)處理120題（R語言tidyverse版本）

玩轉(zhuǎn)數(shù)據(jù)處理120題之P1-P20（R語言tidyverse版本）

玩轉(zhuǎn)數(shù)據(jù)處理120題之P21-P50（R語言tidyverse版本）

玩轉(zhuǎn)數(shù)據(jù)處理120題之P51-P80（R語言tidyverse版本）

玩轉(zhuǎn)數(shù)據(jù)處理120題之P81-P100（R語言tidyverse版本）

玩轉(zhuǎn)數(shù)據(jù)處理120題之P101-P120（R語言tidyverse版本）

參考資源：

Tidyverse 包官方更新處：其實看這個就行了，其他的都是這個的衍生。。。
- 2020-0309-dplyr 1.0.0 is coming soon：關(guān)于 dplyr 1.0 的幾句話
- 2020-0320-dplyr 1.0.0: new summarise() features
- 2020-0327-dplyr 1.0.0: select, rename, relocate
- 2020-0403-dplyr 1.0.0: working across columns
- 2020-0410-dplyr 1.0.0: working within rows
- 2020-04-27- dplyr 1.0.0 and vctrs
2020-0414-Dplyr across: First look at a new Tidyverse function
2020-0415-The Seven Key Things You Need To Know About dplyr 1.0.0
- twitter 鏈接：https://twitter.com/dr_keithmcnulty/status/1250404270027026432
- 1. Built in tidyselect
- 1. relocate()
- 1. Superpowered summarise()
- 1. colwise using across()
- 1. new rowwise() grammar
- 1. easy modeling inside dataframes
- 1. nest_by()
2020-04-11-dplyr 1.0 代碼示例 ：建議不用看，看官方的示例即可了
Twitter 上 dplyr 的話題標(biāo)簽 #dplyr
Nick Merlino 2020/05/27-My Favorite dplyr 1.0.0 Features
Tidyverse Case Study: Anscombe’s quartet
知乎張敬信老師的 【R語言】dplyr1.0.0新功能解讀
2020-0602-dplyr 1.0.0 （58 頁 PPT 講解），可以說是 dplyr 包的發(fā)展史了（強烈推薦）。
- twitter 鏈接：https://twitter.com/rdataberlin/status/1268266145909551106
- github 代碼 Rmarkdown 鏈接：https://github.com/courtiol/Rcourses/tree/master/dplyr_1_0_0

dplyr 1.0.0 小結(jié)

那么這一次 dplyr 1.0.0 更新后多了些什么內(nèi)容呢？又帶了怎樣更便捷的操作。請允許我一一道來。

dplyr 包中有哪些核心函數(shù)呢？

select()：列操作，
rename()：對列進行重命名
mutate()：創(chuàng)建新的列
filter()：行操作，按條件篩選出所需要的行
summarise()：匯總函數(shù)
arrange(): 排序函數(shù)
*_join()：多個表格（數(shù)據(jù)）之間的操作
relocate()：更方便的調(diào)整列的位置
slice()：功能類似 head() 函數(shù)、但是比 head() 函數(shù)更為強大，可以輸出特定行、最大值的行、最小值的行、隨機選擇若干行或者百分比行
across()：內(nèi)置于 summarise()、mutate() 等函數(shù)內(nèi)部，使得數(shù)據(jù)處理更加簡單，取代了之前的一系列 *_if()、*_at()、*_all() 子函數(shù)，使得對列可以同時進行多個函數(shù)處理。
rowwise(): 使得在 R 中對于數(shù)據(jù)按照行進行數(shù)據(jù)分析，比如：感興趣的列的每一行的統(tǒng)計運算。
c_across(): 常常與 rowwise() 函數(shù)連用，行處理中的 across()
...

下面我們來逐一介紹。

select()

按照位置：
- df %>% select(1, 5, 10)
- df %>% select(1:4)
按照名字：
- df %>% select(a, e, j)
- df %>% select(c(a, e, j))
- df %>% select(a:d)
按照函數(shù)選擇：
- df %>% select(starts_with("x"))：選擇列名以 x 開頭列
- df %>% select(ends_with("s"))：選擇列名以 s 結(jié)尾的列
- df %>% select(num_range("x", 1:3)) ：選擇列名為 x1、x2、x3 的列
- df %>% select(contains("ijk"))：匹配包含列名中 “ijk” 的名稱的列
- df %>% select(matches("(.)\\1")) ：通過正則來進行匹配列
- 也可以通過與 contains() 和 matches() 、str_c()等函數(shù)連用
按照數(shù)據(jù)類型：
- df %>% select(where(is.numeric))
- df %>% select(where(is.factor))
- df %>% select(where(~is.numeric(.x) & mean(.x, na.omit = TRUE) > 1))
通過布爾運算符進行多個組合
- df %>% select(!where(is.factor))
- df %>% select(where(is.numeric) & starts_with("x"))
- df %>% select(starts_with("a") | ends_with("z"))

rename()

直接修改：
- df1 %>% rename(b = 2)；b 表示修改后的列名，2 表示第二列
按照函數(shù)：
- df2 %>% rename_with(toupper)
- df2 %>% rename_with(toupper, !col1)
- df2 %>% rename_with(toupper, starts_with("x"))
- df2 %>% rename_with(toupper, where(is.numeric))

mutate()

可以很方便的新增列，而且新列一旦創(chuàng)建就可以直接被用來創(chuàng)建新列。
- df %>% mutate(new_col = col1 + col2, new_col1 = new_col/2)
.keep 參數(shù)
- .keep = "all": 全都保留，和 dplyr 1.0.0 之前版本一致
- .keep = "used": 只保留用來計算得到新列的列
- .keep = "unused": 只保留沒有用來處理得到新列的列
- .keep = "none": 只保留新增的列，相當(dāng)于函數(shù) transmute()
.before 參數(shù)可以控制新增列的位置在哪一列之前
.after 參數(shù)可以控制新增列的位置在哪一列之后

filter

可以通過布爾運算篩選符合條件的行

df %>% filter(col > 1 & col2 == "A")
df %>% filter(col1 == 1 & col1 == 2)
df %>% filter(col %in% c("A", "B"))
between() 函數(shù)

summarise()

匯總函數(shù)。一般結(jié)合 group_by() 、across() 、數(shù)學(xué)統(tǒng)計運算函數(shù)、自定義函數(shù) 等連用。新版本中可以創(chuàng)建新的一列，更方便查看數(shù)據(jù)結(jié)果

mtcars %>%
  group_by(carb) %>%
  summarise(disp_q  = quantile(disp, c(0.25, 0.50, 0.75)),
            q = c(0.25, 0.50, 0.75))
`summarise()` regrouping output by 'carb' (override with `.groups` argument)
# A tibble: 18 x 3
# Groups:   carb [6]
    carb disp_q     q
   <dbl>  <dbl> <dbl>
 1     1   78.8  0.25
 2     1  108    0.5 
 3     1  173.   0.75
 4     2  120.   0.25
 5     2  144.   0.5 
 6     2  314.   0.75
 7     3  276.   0.25
 8     3  276.   0.5 
 9     3  276.   0.75
10     4  168.   0.25
11     4  350.   0.5 
12     4  420    0.75
13     6  145    0.25
14     6  145    0.5 
15     6  145    0.75
16     8  301    0.25
17     8  301    0.5 
18     8  301    0.75


mtcars %>%
  group_by(carb) %>%
  summarise(disp_q  = quantile(disp, c(0.25, 0.50, 0.75)),
            q = c(0.25, 0.50, 0.75)) %>%
  slice_head()
`summarise()` regrouping output by 'carb' (override with `.groups` argument)
# A tibble: 6 x 3
# Groups:   carb [6]
   carb disp_q     q
  <dbl>  <dbl> <dbl>
1     1   78.8  0.25
2     2  120.   0.25
3     3  276.   0.25
4     4  168.   0.25
5     6  145    0.25
6     8  301    0.25


mtcars %>%
  group_by(carb) %>%
  summarise(disp_q  = quantile(disp, c(0.25, 0.50, 0.75))) %>%
  slice_head()

R version 3.6.2 (2019-12-12) -- "Dark and Stormy Night"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

R是自由軟件，不帶任何擔(dān)保。
在某些條件下你可以將其自由散布。
用'license()'或'licence()'來看散布的詳細條件。

R是個合作計劃，有許多人為之做出了貢獻.
用'contributors()'來看合作者的詳細情況
用'citation()'會告訴你如何在出版物中正確地引用R或R程序包。

用'demo()'來看一些示范程序，用'help()'來閱讀在線幫助文件，或
用'help.start()'通過HTML瀏覽器來看幫助文件。
用'q()'退出R.

> library(tidyverse)
-- Attaching packages --------------------------------------- tidyverse 1.3.0 --
√ ggplot2 3.3.0.9000     √ purrr   0.3.3     
√ tibble  3.0.1          √ dplyr   1.0.0     
√ tidyr   1.0.2          √ stringr 1.4.0     
√ readr   1.3.1          √ forcats 0.4.0     
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
Warning message:
package ‘tibble’ was built under R version 3.6.3 
> mtcars
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
> mtcars %>%
    group_by(card) %>%
    summarise(disp_q  = quantile(disp),
              q = c(0.25, 0.50, 0.75))
Error: Must group by variables found in `.data`.
* Column `card` is not found.
Run `rlang::last_error()` to see where the error occurred.
> mtcars %>%
    group_by(carb) %>%
    summarise(disp_q  = quantile(disp),
              q = c(0.25, 0.50, 0.75))
Error: Problem with `summarise()` input `q`.
x Input `q` must be size 5 or 1, not 3.
i Input `q` is `c(0.25, 0.5, 0.75)`.
i An earlier column had size 5.
i The error occured in group 1: carb = 1.
Run `rlang::last_error()` to see where the error occurred.
> mtcars %>%
    group_by(carb) %>%
    summarise(disp_q  = quantile(disp, c(0.25, 0.50, 0.75)),
              q = c(0.25, 0.50, 0.75))
`summarise()` regrouping output by 'carb' (override with `.groups` argument)
# A tibble: 18 x 3
# Groups:   carb [6]
    carb disp_q     q
   <dbl>  <dbl> <dbl>
 1     1   78.8  0.25
 2     1  108    0.5 
 3     1  173.   0.75
 4     2  120.   0.25
 5     2  144.   0.5 
 6     2  314.   0.75
 7     3  276.   0.25
 8     3  276.   0.5 
 9     3  276.   0.75
10     4  168.   0.25
11     4  350.   0.5 
12     4  420    0.75
13     6  145    0.25
14     6  145    0.5 
15     6  145    0.75
16     8  301    0.25
17     8  301    0.5 
18     8  301    0.75
> mtcars %>%
    group_by(carb) %>%
    summarise(disp_q  = quantile(disp, c(0.25, 0.50, 0.75))) %>%
    slice(5)
`summarise()` regrouping output by 'carb' (override with `.groups` argument)
# A tibble: 0 x 2
# Groups:   carb [0]
# ... with 2 variables: carb <dbl>, disp_q <dbl>
> mtcars %>%
    group_by(carb) %>%
    summarise(disp_q  = quantile(disp, c(0.25, 0.50, 0.75))) %>%
    slice()
`summarise()` regrouping output by 'carb' (override with `.groups` argument)
# A tibble: 18 x 2
# Groups:   carb [6]
    carb disp_q
   <dbl>  <dbl>
 1     1   78.8
 2     1  108  
 3     1  173. 
 4     2  120. 
 5     2  144. 
 6     2  314. 
 7     3  276. 
 8     3  276. 
 9     3  276. 
10     4  168. 
11     4  350. 
12     4  420  
13     6  145  
14     6  145  
15     6  145  
16     8  301  
17     8  301  
18     8  301  
> mtcars %>%
    group_by(carb) %>%
    summarise(disp_q  = quantile(disp, c(0.25, 0.50, 0.75))) %>%
    slice_head()
`summarise()` regrouping output by 'carb' (override with `.groups` argument)
# A tibble: 6 x 2
# Groups:   carb [6]
   carb disp_q
  <dbl>  <dbl>
1     1   78.8
2     2  120. 
3     3  276. 
4     4  168. 
5     6  145  
6     8  301  
> mtcars %>%
    group_by(carb) %>%
    summarise(disp_q  = quantile(disp, c(0.25, 0.50, 0.75)),
              q = c(0.25, 0.50, 0.75)) %>%
    slice_head()
`summarise()` regrouping output by 'carb' (override with `.groups` argument)
# A tibble: 6 x 3
# Groups:   carb [6]
   carb disp_q     q
  <dbl>  <dbl> <dbl>
1     1   78.8  0.25
2     2  120.   0.25
3     3  276.   0.25
4     4  168.   0.25
5     6  145    0.25
6     8  301    0.25
> mtcars %>%
    group_by(carb) %>%
    summarise(disp_q  = quantile(disp, c(0.25, 0.50, 0.75))) %>%
    slice_head()
`summarise()` regrouping output by 'carb' (override with `.groups` argument)
# A tibble: 6 x 2
# Groups:   carb [6]
   carb disp_q
  <dbl>  <dbl>
1     1   78.8
2     2  120. 
3     3  276. 
4     4  168. 
5     6  145  
6     8  301

arrange

df %>% arrange(col1, col2)：默認升序
df %>% arrange(desc(col1))：desc 降序
df %>% arrange(col1 - col2)

*_join()

inner_join() ：內(nèi)連接；by 指定兩個表相同的鍵
left_join() ：左連接；保留 x 中的所有觀測。
full_join() ：全連接；保留 x 和 y 中的所有觀測
right_join() ：右連接；保留 y 中的所有觀測
semi_join(x, y)：保留 x 表中與 y 表中的觀測相匹配的所有觀測
anti_join(x, y)：丟棄 x 表中與 y 表中的觀測相匹配的所有觀測

relocate()

df3 %>% relocate(y, z)；將 yz 列移到最前面
df3 %>% relocate(where(is.character))；將字符串類型列都放到最前面
df3 %>% relocate(w, .after = y)；將 w 列移動到 y 列后面
df3 %>% relocate(w, .before = y)；將 w 列移動到 y 列前面
df3 %>% relocate(w, .after = last_col())；將 w 列移至最后面

slice()

top_n()、 sample_n()、 sample_frac() 這三個函數(shù)已經(jīng)被 slice 新增的子函數(shù)所替代

slice_head()：默認只輸出第一行，如果數(shù)據(jù)分組了則為每一個組的第一行
- df %>% slice_head(prop = 0.1)
- df %>% slice_head(prop = 10)
slice_tail()：默認只輸出最后一行，其他參數(shù)同 slice_head()
slice_sample()：默認隨機輸出一行，
slice_min()：
slice_max()
slice()

其中 slice_head() 、slice_sample() 中新增了參數(shù) n = 和 prop =，n 表示多上行，prop 表示所占數(shù)據(jù)行的比例。相當(dāng)于函數(shù) sample_n() 和 sample_frac()。

top_n 被函數(shù) slice_min() 和 slice_max() 所替代

across

across(.cols = everything(), .fns = NULL, ..., .names = NULL)

第一個參數(shù)，選擇你所想要操作的列（類似于 select() 函數(shù)），我們可以通過位置、名字、數(shù)據(jù)類型來選擇。
第二個參數(shù)，.fns 就是要對列進行的操作函數(shù)，可以類似 purrr 中的公式，比如：~ .x/2

為什么我們要多使用 across()

across() 函數(shù)可以很方便的同時對列進行多個操作
across() 函數(shù)減少了 dplyr 所需要提供的函數(shù)數(shù)目。使得 dplyr 用起來更加方便以及更加通俗易懂
across() 整合了之前后綴為 _if、_at 等函數(shù)的功能，使我們能夠按照位置、列名、列數(shù)據(jù)類型來篩選數(shù)據(jù)
across() 不需要 vars() 函數(shù)，_at() 函數(shù)是 dplyr 中唯一必須手動引用變量名的地方。

注意：across() 函數(shù)不能與 select() 、rename() 函數(shù)連用，因為他們已經(jīng)使用了選擇的語法，我們?nèi)绻胍褂煤瘮?shù)來改變列名那么就需要使用函數(shù) rename_with()

本次更新最為重要的一個函數(shù)。所有 *_if()、 *_at()、 *_all() 變體函數(shù)都已經(jīng)被 across() 函數(shù)所取代，使得所有列進行相同操作更為便捷。

怎么轉(zhuǎn)換我們之前基于 _at、_if、_all 等后綴的函數(shù)處理為 across()

去掉 _at、 _if、 _all 后綴
變?yōu)?across()
- _if 系列則改為 where()
- _at() 系列則去掉 vars 函數(shù)即可
- _all() 系列則改為 everything() 即可

across() 與其他函數(shù)連用

across() 與 mutate() 連用

df %>% mutate_if(is.numeric, log)
df %>% mutate(across(where(is.numeric), log))

rescale01 <- function(x){
  rng <- range(x, na.rm = T)
  (x - rng[1])/(rng[2] - rng[1])
}

df <- tibble(x = 1:4, y = rnorm(4))

df %>%
  mutate(across(where(is.numeric), rescale01))
## # A tibble: 4 x 2
##       x     y
##   <dbl> <dbl>
## 1 0     0    
## 2 0.333 0.291
## 3 0.667 0.207
## 4 1     1

across(where()) 與 summarise() 函數(shù)

# 選擇字符串列進行統(tǒng)計長度信息
starwars %>%
  summarise(across(where(is.character), ~length(unique(.x))))

# 選取數(shù)值列，進行求均值
starwars %>%
  group_by(homeworld) %>%
  filter(n() > 1) %>%
  summarise(across(where(is.numeric), ~ mean(.x, na.rm = T)))

across(everything()) 取代 mutate_all()
across() 與 count() 函數(shù)連用

starwars %>%
  count(across(contains("color")), sort = TRUE)

across() 與 distinct() 函數(shù)連用

starwars %>%
  distinct(across(contains("color")))

across() 與 filter() 函數(shù)連用

# 查找所有沒有缺失值 NA 的列
starwars %>%
  filter(across(everything(), ~ !is.na(.x)))

通過 across() 對列同時進行多個操作

min_max <- list(
  min = ~min(.x, na.rm = T),
  max = ~max(.x, na.rm = T)
)

starwars %>%
  summarise(across(where(is.numeric), min_max))


# 怎么控制輸出結(jié)果列名呢？
# 使用 glue 包
# {fn} 表示使用的函數(shù)名，{col} 表示操作的列名
starwars %>%
  summarise(across(where(is.numeric), min_max, .names = "{fn}.{col}"))
## # A tibble: 1 x 6
##   min.height max.height min.mass max.mass min.birth_year max.birth_year
##        <int>      <int>    <dbl>    <dbl>          <dbl>          <dbl>
## 1         66        264       15     1358              8            896

# 如果我們想要將同樣函數(shù)處理的數(shù)據(jù)放置于一起，我們就需要將函數(shù)分開
# 我們可以看到結(jié)果是很奇怪的。
starwars %>%
  summarise(across(where(is.numeric), ~min(.x, na.rm = T), .names = "min.{col}"),
            across(where(is.numeric), ~max(.x, na.rm = T), .names = "max.{col}"))
## # A tibble: 1 x 9
##   min.height min.mass min.birth_year max.height max.mass max.birth_year
##        <int>    <dbl>          <dbl>      <int>    <dbl>          <dbl>
## 1         66       15              8        264     1358            896
## # ... with 3 more variables: max.min.height <int>, max.min.mass <dbl>,
## #   max.min.birth_year <dbl>

總之這是一個非常重要的函數(shù)。但是以下幾種情況需要注意：

across 在結(jié)合 summarise() 函數(shù)使用時候，會自動將前面所計算的函數(shù)：比如 n() 考慮在內(nèi)，會覆蓋 n() 結(jié)果。

df <- data.frame(x = c(1, 2, 3), y = c(1, 4, 9))
df %>%
  summarise(n = n(), across(where(is.numeric), sd))
##    n x        y
## 1 NA 1 4.041452

# 可看到這里 n() 統(tǒng)計結(jié)果為 NA，因為 n 為一個數(shù)值，所以后面 across() 計算了他的 sd 值，3 的 sd 值為 NA，如果我們想解決這一個問題，我們就需要將 n() 統(tǒng)計放置于 across() 函數(shù)處理之后
df %>%
  summarise(across(where(is.numeric), sd), n = n())
##   x        y n
## 1 1 4.041452 3

# 還有另外一種方法，即在 across() 函數(shù)中加上一個條件 !n
df %>%
  summarise(n = n(), across(where(is.numeric) & !n, sd))
##   n x        y
## 1 3 1 4.041452

rowwise()

在 R 中 dplyr 通常是對列進行操作，然而對于行處理方面還是比較困難， rowwise()函數(shù)來對數(shù)據(jù)進行行處理，常與 c_across() 連用。

本節(jié)中列舉了三個常見的案例：

行水平的計算（比如，xyz 的平均值）
使用不同的參數(shù)調(diào)用同一個函數(shù)
對列表列進行操作

當(dāng)然這些問題我們可以通過類似 for 等循環(huán)來進行操作，但是我們可以通過管道的形式進行更便捷的操作，這里作者有一句經(jīng)典的話:

Of course, someone has to write loops. It doesn’t have to be you. — Jenny Bryan

rowwise 按行來進行分組，和 group_by() 函數(shù)一樣，并不會改變數(shù)據(jù)得內(nèi)容，僅僅是進行分組：

df <- tibble(x = 1:2, y = 3:4, z = 5:6)
df %>% rowwise()
# 可以看到下面中多一個表示符號：Rowwise
## # A tibble: 2 x 3
## # Rowwise: 
##       x     y     z
##   <int> <int> <int>
## 1     1     3     5
## 2     2     4     6

# 計算的是數(shù)據(jù)中所有的數(shù)值的平均值
df %>% mutate(m = mean(c(x, y, z)))
## # A tibble: 2 x 4
##       x     y     z     m
##   <int> <int> <int> <dbl>
## 1     1     3     5   3.5
## 2     2     4     6   3.5

# 計算每一列的平均值
df %>% mutate(across(everything(), ~mean(.x, na.rm = T)))
## # A tibble: 2 x 3
##       x     y     z
##   <dbl> <dbl> <dbl>
## 1   1.5   3.5   5.5
## 2   1.5   3.5   5.5

# 計算的是每一行的平均值
df %>% rowwise() %>% mutate(m = mean(c(x, y, z)))
## # A tibble: 2 x 4
## # Rowwise: 
##       x     y     z     m
##   <int> <int> <int> <dbl>
## 1     1     3     5     3
## 2     2     4     6     4

rowwise() 與 summarise() 函數(shù)連用

df <- tibble(name = c("Mara", "Hadley"), x = 1:2, y = 3:4, z = 5:6)

# 結(jié)果僅僅只有值
df %>% 
  rowwise() %>% 
  summarise(m = mean(c(x, y, z)))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 1
##       m
##   <dbl>
## 1     3
## 2     4


# 可以通過加上需要處理的行作為 summarise() 的行名，可以使用 `rowwise(name)`，保留 `name` 列
df %>% 
  rowwise(name) %>% 
  summarise(m = mean(c(x, y, z)))
## `summarise()` regrouping output by 'name' (override with `.groups` argument)
## # A tibble: 2 x 2
## # Groups:   name [2]
##   name       m
##   <chr>  <dbl>
## 1 Mara       3
## 2 Hadley     4


df <- tibble(id = 1:6, w = 10:15, x = 20:25, y = 30:35, z = 40:45)
df
## # A tibble: 6 x 5
##      id     w     x     y     z
##   <int> <int> <int> <int> <int>
## 1     1    10    20    30    40
## 2     2    11    21    31    41
## 3     3    12    22    32    42
## 4     4    13    23    33    43
## 5     5    14    24    34    44
## 6     6    15    25    35    45
# 使用 `rowwise` 對數(shù)據(jù)進行行分組 
rf <- df %>% rowwise(id)

rf %>% mutate(total = sum(c(w, x, y, z)))
## # A tibble: 6 x 6
## # Rowwise:  id
##      id     w     x     y     z total
##   <int> <int> <int> <int> <int> <int>
## 1     1    10    20    30    40   100
## 2     2    11    21    31    41   104
## 3     3    12    22    32    42   108
## 4     4    13    23    33    43   112
## 5     5    14    24    34    44   116
## 6     6    15    25    35    45   120
rf %>% summarise(total = sum(c(w, x, y, z)))
## `summarise()` regrouping output by 'id' (override with `.groups` argument)
## # A tibble: 6 x 2
## # Groups:   id [6]
##      id total
##   <int> <int>
## 1     1   100
## 2     2   104
## 3     3   108
## 4     4   112
## 5     5   116
## 6     6   120

c_across

常常與 rowwise() 函數(shù)連用，行處理中的 across()

rf <- tibble(id = 1:6, w = 10:15, x = 20:25, y = 30:35, z = 40:45) %>% rowwise(id)

rf %>% mutate(total = sum(c_across(w:z)))
## # A tibble: 6 x 6
## # Rowwise:  id
##      id     w     x     y     z total
##   <int> <int> <int> <int> <int> <int>
## 1     1    10    20    30    40   100
## 2     2    11    21    31    41   104
## 3     3    12    22    32    42   108
## 4     4    13    23    33    43   112
## 5     5    14    24    34    44   116
## 6     6    15    25    35    45   120
rf %>% mutate(total = sum(c_across(where(is.numeric))))
## # A tibble: 6 x 6
## # Rowwise:  id
##      id     w     x     y     z total
##   <int> <int> <int> <int> <int> <int>
## 1     1    10    20    30    40   100
## 2     2    11    21    31    41   104
## 3     3    12    22    32    42   108
## 4     4    13    23    33    43   112
## 5     5    14    24    34    44   116
## 6     6    15    25    35    45   120

rowwise() 、c_across()、across() 連用

ungroup() 取消分組，這里表示取消按照行進行分組

rf %>% 
  mutate(total = sum(c_across(w:z))) %>% 
  ungroup() %>% 
  mutate(across(w:z, ~ . / total))
## # A tibble: 6 x 6
##      id     w     x     y     z total
##   <int> <dbl> <dbl> <dbl> <dbl> <int>
## 1     1 0.1   0.2   0.3   0.4     100
## 2     2 0.106 0.202 0.298 0.394   104
## 3     3 0.111 0.204 0.296 0.389   108
## 4     4 0.116 0.205 0.295 0.384   112
## 5     5 0.121 0.207 0.293 0.379   116
## 6     6 0.125 0.208 0.292 0.375   120

行處理函數(shù)總結(jié)：rowSums() 和 rowMeans()

內(nèi)置行處理函數(shù)更快，對行進行操作，沒有分成行、然后統(tǒng)計，最后連接到一起。

df %>% mutate(total = rowSums(across(where(is.numeric))))
## # A tibble: 6 x 6
##      id     w     x     y     z total
##   <int> <int> <int> <int> <int> <dbl>
## 1     1    10    20    30    40   101
## 2     2    11    21    31    41   106
## 3     3    12    22    32    42   111
## 4     4    13    23    33    43   116
## 5     5    14    24    34    44   121
## 6     6    15    25    35    45   126

df %>% mutate(mean = rowMeans(across(where(is.numeric))))
## # A tibble: 6 x 6
##      id     w     x     y     z  mean
##   <int> <int> <int> <int> <int> <dbl>
## 1     1    10    20    30    40  20.2
## 2     2    11    21    31    41  21.2
## 3     3    12    22    32    42  22.2
## 4     4    13    23    33    43  23.2
## 5     5    14    24    34    44  24.2
## 6     6    15    25    35    45  25.2

重復(fù)的函數(shù)調(diào)用：按行傳入變量參數(shù)

rowwise() 不僅適用于返回長度為 1 的向量的函數(shù); 如果結(jié)果是一個列表，它可以與任何函數(shù)一起連用。這意味著 rowwise() 和 mutate() 提供了一種優(yōu)雅的方法，可以多次使用不同的參數(shù)調(diào)用函數(shù)，將輸出存儲在輸入旁邊。

一定要用 list() 函數(shù)來將命令括起來，比如 list(runif(n, min, max)) 而非 runif(n, min, max)

df <- tribble(
  ~ n, ~ min, ~ max,
    1,     0,     1,
    2,    10,   100,
    3,   100,  1000,
)

df %>% 
  rowwise() %>% 
  mutate(data = list(runif(n, min, max)))
## # A tibble: 3 x 4
## # Rowwise: 
##       n   min   max data     
##   <dbl> <dbl> <dbl> <list>   
## 1     1     0     1 <dbl [1]>
## 2     2    10   100 <dbl [2]>
## 3     3   100  1000 <dbl [3]>

兩兩多重組合：tidyr::expand_grid() 函數(shù)

# 這里就會得到  3*3 九種結(jié)果
df <- expand.grid(mean = c(-1, 0, 1), sd = c(1, 10, 100))

df %>% 
  rowwise() %>% 
  mutate(data = list(rnorm(10, mean, sd)))

各種功能：結(jié)合 do.call()

df <- tribble(
   ~rng,     ~params,
   "runif",  list(n = 10), 
   "rnorm",  list(n = 20),
   "rpois",  list(n = 10, lambda = 5),
) %>%
  rowwise()

df %>% 
  mutate(data = list(do.call(rng, params)))
## # A tibble: 3 x 3
## # Rowwise: 
##   rng   params           data      
##   <chr> <list>           <list>    
## 1 runif <named list [1]> <dbl [10]>
## 2 rnorm <named list [1]> <dbl [20]>
## 3 rpois <named list [2]> <int [10]>

最重要的是用來建模

nest_by() 分組存儲為一個 list

by_cyl <- mtcars %>% nest_by(cyl)
by_cyl
## # A tibble: 3 x 2
## # Rowwise:  cyl
##     cyl                data
##   <dbl> <list<tbl_df[,10]>>
## 1     4           [11 x 10]
## 2     6            [7 x 10]
## 3     8           [14 x 10]

按行線性建模

mods <- by_cyl %>% mutate(mod = list(lm(mpg ~ wt, data = data)))
mods
## # A tibble: 3 x 3
## # Rowwise:  cyl
##     cyl                data mod   
##   <dbl> <list<tbl_df[,10]>> <list>
## 1     4           [11 x 10] <lm>  
## 2     6            [7 x 10] <lm>  
## 3     8           [14 x 10] <lm>
mods <- mods %>% mutate(pred = list(predict(mod, data)))
mods
## # A tibble: 3 x 4
## # Rowwise:  cyl
##     cyl                data mod    pred      
##   <dbl> <list<tbl_df[,10]>> <list> <list>    
## 1     4           [11 x 10] <lm>   <dbl [11]>
## 2     6            [7 x 10] <lm>   <dbl [7]> 
## 3     8           [14 x 10] <lm>   <dbl [14]>

dplyr 簡介

這次對于 dplyr 包函數(shù)更新了一個很重要的說明參考文件書，主要分為以下幾個方面，方便我們系統(tǒng)的去學(xué)習(xí)（本文大多數(shù)例子也是從中而來）。

dplyr 簡介，是學(xué)習(xí) dplyr 包主要功能的最佳選擇地方，沒有之一，其中包括以下幾個方面：

base R 操作與 dplyr 操作的等同函數(shù)

列操作
兼容性操作
dplyr
分組操作
常見的 dplyr 相關(guān)編程
行操作
兩個數(shù)據(jù)之間的操作：*join() 系列操作 (翻譯不到位的勿見怪)

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

關(guān)于 dplyr 1.0.0 出來后我想分享的一些東西

關(guān)于 dplyr 1.0.0 出來后我想分享的一些東西

關(guān)于 dplyr 1.0.0 的幾個我的筆記：

我想推薦的幾本圍繞《R for data science》相關(guān)的幾本書

我想推薦的幾篇 dplyr 博文：

參考資源：

dplyr 1.0.0 小結(jié)

select()

rename()

mutate()

filter

summarise()

arrange

*_join()

relocate()

slice()

across

rowwise()

c_across

dplyr 簡介

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

關(guān)于 dplyr 1.0.0 出來后我想分享的一些東西

關(guān)于 dplyr 1.0.0 的幾個我的筆記：

我想推薦的幾本圍繞 《R for data science》相關(guān)的幾本書

我想推薦的幾篇 dplyr 博文：

參考資源：

dplyr 1.0.0 小結(jié)

select()

rename()

mutate()

filter

summarise()

arrange

*_join()

relocate()

slice()

across

rowwise()

c_across

dplyr 簡介

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

我想推薦的幾本圍繞《R for data science》相關(guān)的幾本書