stat_summary functions are so flexible that they can save a lot of extra coding effort when they are put to good use.
After the ggplot2 main function defines the mapping, you can directly use stat_summary to plot the graph.
ggplot(.,aes(x = weight , y = species.coverage, fill = weight))+
# geom_boxplot(outlier.size = 1)+
stat_summary(fun = "mean", size = 2, geom = "bar",position = position_dodge(0.75)) +
## 繪制bar,數(shù)值來(lái)源于計(jì)算后的均值
stat_summary(fun.data = "mean_cl_boot", geom = "errorbar", width = .15,position = position_dodge(0.75))
## 添加該列值的置信區(qū)間, 計(jì)算方法是“mean_cl_boot”,假設(shè)不符合正態(tài)分布的數(shù)值向量
這些函數(shù)來(lái)源于 Hmisc包
smean.cl.normal computes 3 summary variables: the sample mean and lower and upper Gaussian confidence limits based on the t-distribution.
smean.sd computes the mean and standard deviation.
smean.sdl computes the mean plus or minus a constant times the standard deviation. smean.cl.boot is a very fast implementation of the basic nonparametric bootstrap for obtaining confidence limits for the population mean without assuming normality.
These functions all delete NAs automatically.
smedian.hilow computes the sample median and a selected pair of outer quantiles having equal tail areas.
In this way, the calculation of the histogram + bootstrap + confidence interval is directly completed, which is much simpler than the constructor to calculate these things first.
If you don't use stat_summary functions, you need to use the group_by and summarise functions to calculate CI, which is troublesome.
df <- data.frame(A = rnorm(2000, mean = 15, sd = 18),
B = rnorm(2000, mean = 25, sd = 17)) %>%
pivot_longer(cols = c(A, B), names_to = "group", values_to = "time") %>%
mutate(time = ifelse(time < 2, abs(time) + rnorm(1,15,7), time))
my_cis <- df %>%
group_by(group) %>%
summarize(M = mean(time),
lwr = M - sd(time) / sqrt(length(time)) * 1.96,
upr = M + sd(time) / sqrt(length(time)) * 1.96)
df %>%
ggplot(aes(x = group)) +
geom_jitter(aes(y = time), width = .1, alpha = .2, color = "pink") +
geom_errorbar(aes(ymin = lwr, ymax = upr), data = my_cis, width = .13, color = "gray25") +
geom_point(aes(y = M), data = my_cis, shape = 18, size = 2)
當(dāng)然,你也可以從ggplot 的stat_summary 中獲取這些ci值,使用
ggplot_build(g)函數(shù)
可以訪問stat_summarywith的數(shù)據(jù)ggplot_build。
首先, ggplot 調(diào)用,存儲(chǔ)在一個(gè)對(duì)象中:
g <- ggplot(iris, aes(x = Species, y = Petal.Length)) +
geom_jitter(width = 0.5) +
stat_summary(fun.y = mean, geom = "point", color = "red") +
stat_summary(fun.data = mean_cl_boot, fun.args=(conf.int=0.9999), geom = "errorbar", width = 0.4)
然后,使用
ggplot_build(g)$data[[3]]
得到 mean_cl_boot:
x group y ymin ymax PANEL xmin xmax colour size linetype width alpha
1 1 1 1.462 1.386000 1.543501 1 0.8 1.2 black 0.5 1 0.4 NA
2 2 2 4.260 4.024899 4.462202 1 1.8 2.2 black 0.5 1 0.4 NA
3 3 3 5.552 5.337199 5.798202 1 2.8 3.2 black 0.5 1 0.4
ref:
r - 使用 mean_cl_boot 獲取 stat_summary 計(jì)算的值_Stack Overflow中文網(wǎng)
r - What do ggplot's stat_summary errorbars mean? - Cross Validated (stackexchange.com)
smean.sd: Compute Summary Statistics on a Vector in Hmisc: Harrell Miscellaneous (rdrr.io)
通過自定義函數(shù)在柱狀圖/箱線圖中添加均值,中位數(shù),樣本量等標(biāo)注信息
自定義函數(shù)
get_box_stats <- function(y, upper_limit = max(df$mpg) * 1.15) {
return(data.frame(
y = 0.95 * upper_limit,
label = paste(
"Count =", length(y), "\n",
"Mean =", round(mean(y), 2), "\n",
"Median =", round(median(y), 2), "\n"
)
))
}
然后將該函數(shù)應(yīng)用于stat_summary中
ggplot(df, aes(x = cyl, y = mpg, fill = cyl)) +
geom_boxplot() +
scale_fill_manual(values = c("#0099f8", "#e74c3c", "#2ecc71")) +
stat_summary(fun.data = get_box_stats, geom = "text", hjust = 0.5, vjust = 0.9) +
theme_classic()