本節(jié)繼續(xù)來介紹
tidyverse中的重要函數(shù)across,需要更新至 dplyr-1-0-0
across()它可以輕松地對多列執(zhí)行相同的操作
across() 有兩個主要參數(shù):
- 第一個參數(shù).cols選擇要操作的列
- 第二個參數(shù).fns是要應用于每一列的一個函數(shù)或函數(shù)列表
創(chuàng)建數(shù)據
gdf <- tibble(g = c(1,1,2,3),v1 = 10:13,v2 = 20:23,v3=1:4)
gdf
g v1 v2 v3
<dbl> <int> <int> <int>
1 1 10 20 1
2 1 11 21 2
3 2 12 22 3
4 3 13 23 4
給每一列加1
gdf %>% mutate(across(v1:v3, ~ .x +1))
g v1 v2 v3
<dbl> <dbl> <dbl> <dbl>
1 1 11 21 2
2 1 12 22 3
3 2 13 23 4
4 3 14 24 5
前兩列四舍五入
iris %>%
as_tibble() %>%
mutate(across(c(1,2),round))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 5 4 1.4 0.2 setosa
2 5 3 1.4 0.2 setosa
3 5 3 1.3 0.2 setosa
還有如下2種寫法
iris %>%
as_tibble() %>%
mutate(across(1:Sepal.Width, round))
iris %>%
as_tibble() %>%
mutate(across(where(is.double) & !c(Petal.Length, Petal.Width), round))
按列求均值
iris %>% summarize(across(is.numeric,mean))
按行求和
iris %>% as_tibble() %>%
rowwise() %>%
mutate(mean = sum(across(where(is.numeric))))
分組求均值
iris %>%
group_by(Species) %>%
summarise(across(starts_with("Sepal"), ~ mean(.x, na.rm = TRUE)))
Species Sepal.Length Sepal.Width
<fct> <dbl> <dbl>
1 setosa 5.01 3.43
2 versicolor 5.94 2.77
3 virginica 6.59 2.97
分組求和
iris %>%
group_by(Species) %>%
summarise(across(starts_with("Sepal"), ~ sum(.x, na.rm=TRUE)))
Species Sepal.Length Sepal.Width
<fct> <dbl> <dbl>
1 setosa 250. 171.
2 versicolor 297. 138.
3 virginica 329. 149.
iris %>%
group_by(Species) %>%
summarise(across(starts_with("Sepal"), list(mean = mean, sd = sd)))
Species Sepal.Length_mean Sepal.Length_sd Sepal.Width_mean Sepal.Width_sd
<fct> <dbl> <dbl> <dbl> <dbl>
1 setosa 5.01 0.352 3.43 0.379
2 versicolor 5.94 0.516 2.77 0.314
3 virginica 6.59 0.636 2.97 0.322
使用.name參數(shù)控制輸出名
iris %>%
group_by(Species) %>%
summarise(across(starts_with("Sepal"), mean, .names = "mean_{.col}"))
Species mean_Sepal.Length mean_Sepal.Width
<fct> <dbl> <dbl>
1 setosa 5.01 3.43
2 versicolor 5.94 2.77
3 virginica 6.59 2.97
篩選沒有缺失值的行
starwars %>% filter(across(everything(), ~ !is.na(.x)))
使用時mutate(),所有轉換across()都將立即應用
df <- tibble(x = 2, y = 4, z = 8)
df %>% mutate(across(everything(), ~ .x / y))
x y z
<dbl> <dbl> <dbl>
1 0.5 1 2
統(tǒng)計字符長度
starwars %>%
summarise(across(where(is.character), ~ length(unique(.x))))
name hair_color skin_color eye_color sex gender homeworld species
<int> <int> <int> <int> <int> <int> <int> <int>
1 87 13 31 15 5 3 49 38
統(tǒng)計列最小/最大值
min_max <- list(
min = ~min(.x, na.rm = TRUE),
max = ~max(.x, na.rm = TRUE)
)
iris %>% summarise(across(where(is.numeric),min_max))
Sepal.Length_min Sepal.Length_max Sepal.Width_min Sepal.Width_max Petal.Length_min Petal.Length_max Petal.Width_min Petal.Width_max
1 4.3 7.9 2 4.4 1 6.9 0.1 2.5
喜歡的小伙伴歡迎關注我的公眾號
R語言數(shù)據分析指南,持續(xù)分享數(shù)據可視化的經典案例及一些生信知識,希望對大家有所幫助