R包學(xué)習(xí)之broom

#broom包接受R中內(nèi)置函數(shù)的雜亂輸出,如lm、nls或t-test,并將它們轉(zhuǎn)換為整齊的數(shù)據(jù)幀。

#就是把非數(shù)據(jù)框的雜亂數(shù)據(jù)整理為數(shù)據(jù)框

#broom+dplyr配合使用

#有三個(gè)功能:tidy;augment;glance

#例子一

```

lmfit <- lm(mpg ~ wt, mtcars)

lmfit

summary(lmfit)

library(broom)

tidy(lmfit)

```

#返回一個(gè)數(shù)據(jù)框,行名變成了名為term的列

# 您可能對(duì)回歸中每個(gè)原始點(diǎn)的擬合值和殘差感興趣,而不是查看系數(shù)。

# 使用augment,它使用來(lái)自模型的信息來(lái)擴(kuò)充原始數(shù)據(jù)

augment(lmfit)

#添加的列前面有一個(gè)點(diǎn).,以避免覆蓋原始列

#對(duì)于整個(gè)回歸計(jì)算,有好幾個(gè)總結(jié)性統(tǒng)計(jì)方法,glance功能可實(shí)現(xiàn)

glance(lmfit)

#例子二

```

#Generalized linear and non-linear models

glmfit <- glm(am ~ wt, mtcars, family="binomial")

tidy(glmfit)

augment(glmfit)

glance(glmfit)

#這些功能對(duì)非線性模型一樣適用

nlsfit <- nls(mpg ~ k / wt + b, mtcars, start=list(k=1, b=0))

tidy(nlsfit)

augment(nlsfit, mtcars)

glance(nlsfit)

#The tidy function can also be applied to htest objects,

#such as those output by popular built-in functions like

#t.test, cor.test, and wilcox.test.

tt <- t.test(wt ~ am, mtcars)

tidy(tt)

wt<-wilcox.test(wt ~ am, mtcars)

tidy(wt)

glance(tt)

glance(wt)

#augment method is defined only for chi-squared tests

chit <- chisq.test(xtabs(Freq ~ Sex + Class, data = as.data.frame(Titanic)))

tidy(chit)

augment(chit)

```

# All functions

# The output of the tidy, augment and glance functions is always a data frame.

# The output never has rownames. This ensures that you can combine it with other tidy outputs without

# fear of losing information (since rownames in R cannot contain duplicates).

# Some column names are kept consistent, so that they can be combined across different models and so

# that you know what to expect (in contrast to asking “is it pval or PValue?” every time). The examples

# below are not all the possible column names, nor will all tidy output contain all or even any of these

# columns.

# tidy functions

# Each row in a tidy output typically represents some well-defined concept, such as one term in a

# regression, one test, or one cluster/class. This meaning varies across models but is usually self-evident.

# The one thing each row cannot represent is a point in the initial data (for that, use the augment method).

# Common column names include:

#? term"" the term in a regression or model that is being estimated.

# p.value: this spelling was chosen (over common alternatives such as pvalue, PValue, or pval) to

# be consistent with functions in R’s built-in stats package

# statistic a test statistic, usually the one used to compute the p-value. Combining these across

# many sub-groups is a reliable way to perform (e.g.) bootstrap hypothesis testing

# estimate

# conf.low the low end of a confidence interval on the estimate

# conf.high the high end of a confidence interval on the estimate

# df degrees of freedom

# augment functions

# augment(model, data) adds columns to the original data.

# If the data argument is missing, augment attempts to reconstruct the data from the model (note that

#? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? this may not always be possible, and usually won’t contain columns not used in the model).

# Each row in an augment output matches the corresponding row in the original data.

# If the original data contained rownames, augment turns them into a column called .rownames.

# Newly added column names begin with . to avoid overwriting columns in the original data.

# Common column names include:

#? .fitted: the predicted values, on the same scale as the data.

# .resid: residuals: the actual y values minus the fitted values

# .cluster: cluster assignments

# glance functions

# glance always returns a one-row data frame.

# The only exception is that glance(NULL) returns an empty data frame.

# We avoid including arguments that were given to the modeling function. For example, a glm glance

# output does not need to contain a field for family, since that is decided by the user calling glm rather

# than the modeling function itself.

# Common column names include:

#? r.squared the fraction of variance explained by the model

# adj.r.squared R^2 adjusted based on the degrees of freedom

# augment(chit)sigma the square root of the estimated variance of the residuals

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容