#broom包接受R中內(nèi)置函數(shù)的雜亂輸出,如lm、nls或t-test,并將它們轉(zhuǎn)換為整齊的數(shù)據(jù)幀。
#就是把非數(shù)據(jù)框的雜亂數(shù)據(jù)整理為數(shù)據(jù)框
#broom+dplyr配合使用
#有三個(gè)功能:tidy;augment;glance
#例子一
```
lmfit <- lm(mpg ~ wt, mtcars)
lmfit
summary(lmfit)
library(broom)
tidy(lmfit)
```
#返回一個(gè)數(shù)據(jù)框,行名變成了名為term的列
# 您可能對(duì)回歸中每個(gè)原始點(diǎn)的擬合值和殘差感興趣,而不是查看系數(shù)。
# 使用augment,它使用來(lái)自模型的信息來(lái)擴(kuò)充原始數(shù)據(jù)
augment(lmfit)
#添加的列前面有一個(gè)點(diǎn).,以避免覆蓋原始列
#對(duì)于整個(gè)回歸計(jì)算,有好幾個(gè)總結(jié)性統(tǒng)計(jì)方法,glance功能可實(shí)現(xiàn)
glance(lmfit)
#例子二
```
#Generalized linear and non-linear models
glmfit <- glm(am ~ wt, mtcars, family="binomial")
tidy(glmfit)
augment(glmfit)
glance(glmfit)
#這些功能對(duì)非線性模型一樣適用
nlsfit <- nls(mpg ~ k / wt + b, mtcars, start=list(k=1, b=0))
tidy(nlsfit)
augment(nlsfit, mtcars)
glance(nlsfit)
#The tidy function can also be applied to htest objects,
#such as those output by popular built-in functions like
#t.test, cor.test, and wilcox.test.
tt <- t.test(wt ~ am, mtcars)
tidy(tt)
wt<-wilcox.test(wt ~ am, mtcars)
tidy(wt)
glance(tt)
glance(wt)
#augment method is defined only for chi-squared tests
chit <- chisq.test(xtabs(Freq ~ Sex + Class, data = as.data.frame(Titanic)))
tidy(chit)
augment(chit)
```
# All functions
# The output of the tidy, augment and glance functions is always a data frame.
# The output never has rownames. This ensures that you can combine it with other tidy outputs without
# fear of losing information (since rownames in R cannot contain duplicates).
# Some column names are kept consistent, so that they can be combined across different models and so
# that you know what to expect (in contrast to asking “is it pval or PValue?” every time). The examples
# below are not all the possible column names, nor will all tidy output contain all or even any of these
# columns.
# tidy functions
# Each row in a tidy output typically represents some well-defined concept, such as one term in a
# regression, one test, or one cluster/class. This meaning varies across models but is usually self-evident.
# The one thing each row cannot represent is a point in the initial data (for that, use the augment method).
# Common column names include:
#? term"" the term in a regression or model that is being estimated.
# p.value: this spelling was chosen (over common alternatives such as pvalue, PValue, or pval) to
# be consistent with functions in R’s built-in stats package
# statistic a test statistic, usually the one used to compute the p-value. Combining these across
# many sub-groups is a reliable way to perform (e.g.) bootstrap hypothesis testing
# estimate
# conf.low the low end of a confidence interval on the estimate
# conf.high the high end of a confidence interval on the estimate
# df degrees of freedom
# augment functions
# augment(model, data) adds columns to the original data.
# If the data argument is missing, augment attempts to reconstruct the data from the model (note that
#? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? this may not always be possible, and usually won’t contain columns not used in the model).
# Each row in an augment output matches the corresponding row in the original data.
# If the original data contained rownames, augment turns them into a column called .rownames.
# Newly added column names begin with . to avoid overwriting columns in the original data.
# Common column names include:
#? .fitted: the predicted values, on the same scale as the data.
# .resid: residuals: the actual y values minus the fitted values
# .cluster: cluster assignments
# glance functions
# glance always returns a one-row data frame.
# The only exception is that glance(NULL) returns an empty data frame.
# We avoid including arguments that were given to the modeling function. For example, a glm glance
# output does not need to contain a field for family, since that is decided by the user calling glm rather
# than the modeling function itself.
# Common column names include:
#? r.squared the fraction of variance explained by the model
# adj.r.squared R^2 adjusted based on the degrees of freedom
# augment(chit)sigma the square root of the estimated variance of the residuals