R語言基礎(chǔ)入門(11) summarise匯總數(shù)據(jù)

本節(jié)來介紹dplyr中的重要函數(shù)count,summarisegroup_by

count

count 統(tǒng)計觀察次數(shù)

library(tidyverse)
msleep %>%
  count(order, sort = TRUE)
   order               n
   <chr>           <int>
 1 Rodentia           22
 2 Carnivora          12
 3 Primates           12
 4 Artiodactyla        6
 5 Soricomorpha        5

也可以在一個count()語句中添加多個變量

msleep %>%
  count(order, vore, sort = TRUE)
   order          vore        n
   <chr>          <chr>   <int>
 1 Rodentia       herbi      16
 2 Carnivora      carni      12
 3 Primates       omni       10
 4 Artiodactyla   herbi       5

summarize

dplyr 中的summarize函數(shù)使用直觀易讀的代碼對統(tǒng)計數(shù)據(jù)進(jìn)行匯總

msleep %>%
  summarise(n = n(), average = mean(sleep_total), maximum = max(sleep_total))
## # A tibble: 1 x 3
##       n average maximum
##   <int>   <dbl>   <dbl>
## 1    83    10.4    19.9

group_by( )按分組進(jìn)行匯總

msleep %>%
  group_by(vore) %>%
  summarise(n = n(), average = mean(sleep_total), maximum = max(sleep_total))
## # A tibble: 5 x 4
##   vore        n average maximum
##   <chr>   <int>   <dbl>   <dbl>
## 1 carni      19   10.4     19.4
## 2 herbi      32    9.51    16.6
## 3 insecti     5   14.9     19.9
## 4 omni       20   10.9     18.0
## 5 <NA>        7   10.2     13.7

summarise( )幾乎適用于任何聚合函數(shù),并允許進(jìn)行額外的算術(shù)運算:

  • n() - 給出觀察次數(shù)
  • n_distinct(var) - 給出唯一值的數(shù)量 var
  • sum(var), max(var), min(var), ...
  • mean(var), median(var), sd(var), IQR(var)

將平均 sleep_total 并除以 24,以獲得一天的睡眠量

msleep %>%
  group_by(vore) %>%
  summarise(avg_sleep_day = mean(sleep_total)/24)
## # A tibble: 5 x 2
##   vore    avg_sleep_day
##   <chr>           <dbl>
## 1 carni           0.432
## 2 herbi           0.396
## 3 insecti         0.622
## 4 omni            0.455
## 5 <NA>            0.424

summarise_all()需要一個函數(shù)作為參數(shù),它將應(yīng)用于所有列;示例代碼計算每列的平均值

msleep %>%
  group_by(vore) %>%
  summarise_all(mean, na.rm=TRUE)
## # A tibble: 5 x 11
##   vore     name genus order conservation sleep_total sleep_rem sleep_cycle
##   <chr>   <dbl> <dbl> <dbl>        <dbl>       <dbl>     <dbl>       <dbl>
## 1 carni      NA    NA    NA           NA       10.4       2.29       0.373
## 2 herbi      NA    NA    NA           NA        9.51      1.37       0.418
## 3 insecti    NA    NA    NA           NA       14.9       3.52       0.161

給每列的值加5

msleep %>%
  group_by(vore) %>%
  summarise_all(~mean(., na.rm = TRUE) + 5)
##   vore     name genus order conservation sleep_total sleep_rem sleep_cycle
##   <chr>   <dbl> <dbl> <dbl>        <dbl>       <dbl>     <dbl>       <dbl>
## 1 carni      NA    NA    NA           NA        15.4      7.29        5.37
## 2 herbi      NA    NA    NA           NA        14.5      6.37        5.42
## 3 insecti    NA    NA    NA           NA        19.9      8.52        5.16

summarise_if()

計算所有數(shù)字列的平均值

msleep %>%
  group_by(vore) %>%
  summarise_if(is.numeric, mean, na.rm=TRUE)

rename_if( )對列進(jìn)行重命名

msleep %>%
  group_by(vore) %>%
  summarise_if(is.numeric, mean, na.rm=TRUE) %>%
  rename_if(is.numeric, ~paste0("avg_", .))
##   vore    avg_sleep_total avg_sleep_rem avg_sleep_cycle avg_awake
##   <chr>             <dbl>         <dbl>           <dbl>     <dbl>
## 1 carni             10.4           2.29           0.373     13.6
## 2 herbi              9.51          1.37           0.418     14.5
## 3 insecti           14.9           3.52           0.161      9.06

summarise_at()

下面的代碼將返回平均含有單詞“睡眠”的所有列,并且還它們重命名為“AVG_ VAR"

msleep %>%
  group_by(vore) %>%
  summarise_at(vars(contains("sleep")), mean, na.rm=TRUE) %>%
  rename_at(vars(contains("sleep")), ~paste0("avg_", .))

top_n( )

保留值最高的5個

msleep %>%
  group_by(order) %>%
  summarise(average = mean(sleep_total)) %>%
  top_n(5)

保留值最低的5個

msleep %>%
  group_by(order) %>%
  summarise(average = mean(sleep_total)) %>%
  top_n(-5)

示例代碼將保留average_sleep 的5 個最高值

msleep %>%
  group_by(order) %>%
  summarise(average_sleep = mean(sleep_total), max_sleep = max(sleep_total)) %>%
  top_n(5, average_sleep)
##   order           average_sleep max_sleep
##   <chr>                   <dbl>     <dbl>
## 1 Afrosoricida             15.6      15.6
## 2 Chiroptera               19.8      19.9
## 3 Cingulata                17.8      18.1
## 4 Didelphimorphia          18.7      19.4

sample_frac()允許隨機選擇一部分行(此處為 10%)

msleep %>% sample_frac(.1)

喜歡的小伙伴歡迎關(guān)注我的公眾號

R語言數(shù)據(jù)分析指南,持續(xù)分享數(shù)據(jù)可視化的經(jīng)典案例及一些生信知識,希望對大家有所幫助

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容