數(shù)據(jù)分析中, 我們有時(shí)候會(huì)有這樣的需求, 比如將已有的數(shù)值列,轉(zhuǎn)化為百分位列。
Case study1:
比如說我們有如下表格,代表了ATGC四種堿基的絕對(duì)個(gè)數(shù)。
Base Num
A 1000
T 2000
G 4000
C 5000
我們現(xiàn)在想增加一列,代表各種堿基的百分比,然后用這個(gè)百分比去畫餅圖, 期待的結(jié)果如下。
Base Num Percentage
A 1000 8.3%
T 2000 16.7%
G 4000 33.3%
C 5000 41.7%
我們應(yīng)該怎么操作呢?利用dplyr里面的mutate函數(shù)即可。
require(tidyverse)
df <- read.table("~/ATGC.txt", header = T)
df %>%
mutate(Percentage=paste0(round(Num/sum(Num)*100,2),"%"))
Case study2:
當(dāng)遇到分組變量,想分別計(jì)算不同的分組條件下,不同的變量的百分比,該怎么做呢?
我們先構(gòu)造一個(gè)數(shù)據(jù)集:
gender <- rep(c("male","female"), each=3)
weight <- c(sample(120:180,3),sample(80:100,3))
df2 <- data.frame(gender, weight)
構(gòu)造的數(shù)據(jù)集df2內(nèi)容如下:
> df2
gender weight
1 male 168
2 male 125
3 male 133
4 female 99
5 female 88
6 female 80
我們通過group_by()函數(shù)進(jìn)行分組操作:
df2 %>%
group_by(gender) %>%
mutate(Percentage=paste0(round(weight/sum(weight)*100,2),"%"))
最終的結(jié)果如下:
# A tibble: 6 x 3
# Groups: gender [2]
gender weight Percentage
<fct> <int> <chr>
1 male 168 39.44%
2 male 125 29.34%
3 male 133 31.22%
4 female 99 37.08%
5 female 88 32.96%
6 female 80 29.96%
Done!