集成學(xué)習(xí)(ensemble learning)是采用多個(gè)機(jī)器學(xué)習(xí)模型組合進(jìn)行綜合預(yù)測(cè)，從而提升模型性能的思路，分為bagging與boosting兩種。之前學(xué)習(xí)的隨機(jī)森林便是bagging的典型代表；而本次學(xué)習(xí)Gradient boosting machines為代表的boosting則是另一種集成思路。此外，集成學(xué)習(xí)使用的基學(xué)習(xí)器模型一般都是決策樹(decision tree)。

1、bagging與boosting的區(qū)別

(1) bagging

建立多個(gè)獨(dú)立(independent)的、弱關(guān)聯(lián)(de-correlated)、的base learner基學(xué)習(xí)器，每個(gè)單獨(dú)的基學(xué)習(xí)器都是強(qiáng)學(xué)習(xí)器；
進(jìn)行預(yù)測(cè)結(jié)果時(shí)，綜合考慮所有模型的預(yù)測(cè)值；即每個(gè)模型都有相同的權(quán)重，是平等的。
bagging適合于low bias and high variance，希望降低variance的情況；

(2) boosting

建立一系列(sequential)的weak base learner基學(xué)習(xí)器。其中第一個(gè)base learner的性能僅僅由于隨機(jī)猜測(cè)(random guessing)，之后建立的模型都是在前一個(gè)模型的基礎(chǔ)上通過調(diào)整參數(shù)（重點(diǎn)關(guān)注上一個(gè)模型預(yù)測(cè)最不準(zhǔn)確地樣本），從而逐漸提高后續(xù)模型的準(zhǔn)確率。
進(jìn)行預(yù)測(cè)結(jié)果時(shí)，以每個(gè)基學(xué)習(xí)器的預(yù)測(cè)性能為權(quán)重(大致意思是性能優(yōu)的基學(xué)習(xí)器有更高的話語權(quán))，綜合考慮所有模型的預(yù)測(cè)值。
boosting適合于high bias and low variance，希望降低bias的情況。

2、GBM，Gradient boosting machines簡(jiǎn)單理解

2.1 Gradient descent

boosting的核心是以提高上一個(gè)模型性能為目的，建立一系列基學(xué)習(xí)器。而提升性能的標(biāo)準(zhǔn)可通過損失函數(shù)進(jìn)行評(píng)價(jià)；而每一次提高性能的多少用學(xué)習(xí)率(learning rate)表示；
AdaBoost算法是早期boosting的流行形式之一，其使用SSE作為損失函數(shù)，即性能評(píng)價(jià)的標(biāo)準(zhǔn)。而GBM算法更加多元，它可以使用除SSE以外的其它指標(biāo)作為損失函數(shù)；

The name gradient boosting machine comes from the fact that this procedure can be generalized to loss functions other than SSE.

2.2 GBM的超參數(shù)

GBM的超參數(shù)主要包含兩類，一類是boosting相關(guān)的參數(shù)；一類是決策樹本身的超參數(shù)

（1）boosting hyperparameters

Number of trees：首先是建立多少個(gè)基學(xué)習(xí)器。一般boosting需要建立較多數(shù)目(thousands)的決策樹，從而提高后續(xù)模型的性能。但是過多的數(shù)目又可能帶來過擬合的問題(bagging則不用擔(dān)心這個(gè)問題)。
Learning rate：取值范圍在0-1之間，一般取0.001~0.3。太大的學(xué)習(xí)率可能會(huì)使模型錯(cuò)過最佳的參數(shù)，從而導(dǎo)致過擬合；太小的值則可能需要建非常多的樹，從而提高了計(jì)算資源與時(shí)間的需求。
由于不同的學(xué)習(xí)率都有不同的最佳決策樹的數(shù)目，所以不需要特別設(shè)置樹參數(shù)，盡量取一個(gè)較大值即可。

（2）tree hyperparameters

樹的深度：一般建議取值范圍在3~8；
終端節(jié)點(diǎn)(terminal nodes)的最小樣本數(shù)：一般建議取值范圍在5~15；

超參數(shù)調(diào)整策略

先確定最佳的學(xué)習(xí)率參數(shù)
再調(diào)整決策樹的相關(guān)參數(shù)

3、代碼實(shí)操

示例數(shù)據(jù)：預(yù)測(cè)房?jī)r(jià)

ames <- AmesHousing::make_ames()
dim(ames)
## [1] 2930   81

set.seed(123)
library(rsample)
split <- initial_split(ames, prop = 0.7, 
                       strata = "Sale_Price")
ames_train  <- training(split)
# [1] 2049   81
ames_test   <- testing(split)
# [1] 881  81

建模R包

library(gbm)

step1 : 先大致探索一下

set.seed(123) # for reproducibility
ames_gbm1 <- gbm(
  formula = Sale_Price ~ .,
  data = ames_train,
  distribution = "gaussian", # SSE loss function
  n.trees = 5000,
  shrinkage = 0.1,
  interaction.depth = 3,
  n.minobsinnode = 10,
  cv.folds = 10)

# find index for number trees with minimum CV error
best <- which.min(ames_gbm1$cv.error)
# [1] 1119

# get MSE and compute RMSE
sqrt(ames_gbm1$cv.error[best])
## [1] 22402.07

# plot error curve
gbm.perf(ames_gbm1, method = "cv")

如下可以看出在1119棵樹的時(shí)候，交叉驗(yàn)證指標(biāo)已經(jīng)達(dá)到平臺(tái)期

step2：學(xué)習(xí)率指標(biāo)優(yōu)化

# create grid search
hyper_grid <- expand.grid(
  learning_rate = c(0.3, 0.1, 0.05, 0.01, 0.005),
  RMSE = NA,
  trees = NA,
  time = NA
)
# execute grid search
for(i in seq_len(nrow(hyper_grid))) {
  # fit gbm
  set.seed(123) # for reproducibility
  train_time <- system.time({
    m <- gbm(
      formula = Sale_Price ~ .,
      data = ames_train,
      distribution = "gaussian",
      n.trees = 5000,
      shrinkage = hyper_grid$learning_rate[i],
      interaction.depth = 3,
      n.minobsinnode = 10,
      cv.folds = 10
    )
  })
  # add SSE, trees, and training time to results
  hyper_grid$RMSE[i] <- sqrt(min(m$cv.error))
  hyper_grid$trees[i] <- which.min(m$cv.error)
  hyper_grid$time[i] <- train_time[["elapsed"]]
}

dplyr::arrange(hyper_grid, RMSE)
#   learning_rate     RMSE trees  time
# 1         0.050 21807.96  1565 66.83
# 2         0.010 22102.34  4986 66.73
# 3         0.100 22402.07  1119 67.84
# 4         0.005 23054.68  4995 66.04
# 5         0.300 24411.95   269 64.84

如上確定最佳的學(xué)習(xí)率參數(shù)為0.05。

step3：優(yōu)化決策樹參數(shù)

# search grid
hyper_grid <- expand.grid(
  n.trees = 5000,
  shrinkage = 0.05,
  interaction.depth = c(3, 5, 7),
  n.minobsinnode = c(5, 10, 15)
)

# create model fit function
model_fit <- function(n.trees, shrinkage, interaction.depth, n.minobsinnode) {
  set.seed(123)
  m <- gbm(
    formula = Sale_Price ~ .,
    data = ames_train,
    distribution = "gaussian",
    n.trees = n.trees,
    shrinkage = shrinkage,
    interaction.depth = interaction.depth,
    n.minobsinnode = n.minobsinnode,
    cv.folds = 10
  )
  # compute RMSE
  sqrt(min(m$cv.error))
}

# perform search grid with functional programming
hyper_grid$rmse <- purrr::pmap_dbl(
  hyper_grid,
  ~ model_fit(
    n.trees = ..1,
    shrinkage = ..2,
    interaction.depth = ..3,
    n.minobsinnode = ..4
  )
)
# results
dplyr::arrange(hyper_grid, rmse)
#   n.trees shrinkage interaction.depth n.minobsinnode     rmse
# 1    5000      0.05                 5             10 21793.28
# 2    5000      0.05                 3             10 21807.96
# 3    5000      0.05                 5              5 21976.76
# 4    5000      0.05                 3              5 22104.49
# 5    5000      0.05                 5             15 22156.30
# 6    5000      0.05                 3             15 22170.16
# 7    5000      0.05                 7             10 22268.51
# 8    5000      0.05                 7              5 22316.37
# 9    5000      0.05                 7             15 22595.51

如上可以看出，最佳的決策樹超參數(shù)組合：（1）interaction.depth=5；（2）n.minobsinnode = 10；
但最佳組合的rmse值(21793)也僅比默認(rèn)值(21807)降低了很少，所以決策樹參數(shù)對(duì)GBM的影響相對(duì)較小。

step4：確定最佳模型，測(cè)試集評(píng)價(jià)

ame_gbm <- gbm(
  formula = Sale_Price ~ .,
  data = ames_train,
  distribution = "gaussian",
  n.trees = 5000,
  shrinkage = 0.05,
  interaction.depth = 5,
  n.minobsinnode = 10,
  cv.folds = 10)
(best <- which.min(ame_gbm$cv.error))
# [1] 1305
sqrt(ame_gbm$cv.error[best])
# [1] 22475.02

#自動(dòng)調(diào)用最佳數(shù)目進(jìn)行預(yù)測(cè)
pred = predict(ame_gbm, ames_test)
ModelMetrics::rmse(pred, ames_test$Sale_Price)
# [1] 20010.21

評(píng)價(jià)特征變量的重要性

vip::vip(ame_gbm)

由于GBM是基于梯度下降的思路，當(dāng)遇到非碗形的損失函數(shù)曲線時(shí)，有可能遇到局部的最低點(diǎn)local minimas，Stochastic gradient descent算法可采用抽樣建模方式盡可能找到全局最低點(diǎn)；此外XGBoost可以盡可能避免boosting算法出現(xiàn)過擬合的情況。具體用法就暫不學(xué)習(xí)了~

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

機(jī)器學(xué)習(xí)--有監(jiān)督--GBM(Boosting)

機(jī)器學(xué)習(xí)--有監(jiān)督--GBM(Boosting)

1、bagging與boosting的區(qū)別

(1) bagging

(2) boosting

2、GBM，Gradient boosting machines簡(jiǎn)單理解

2.1 Gradient descent

2.2 GBM的超參數(shù)

（1）boosting hyperparameters

（2）tree hyperparameters

超參數(shù)調(diào)整策略

3、代碼實(shí)操

示例數(shù)據(jù)：預(yù)測(cè)房?jī)r(jià)

step1 : 先大致探索一下

step2：學(xué)習(xí)率指標(biāo)優(yōu)化

step3：優(yōu)化決策樹參數(shù)

step4：確定最佳模型，測(cè)試集評(píng)價(jià)

評(píng)價(jià)特征變量的重要性

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

機(jī)器學(xué)習(xí)--有監(jiān)督--GBM(Boosting)

1、bagging與boosting的區(qū)別

(1) bagging

(2) boosting

2、GBM，Gradient boosting machines簡(jiǎn)單理解

2.1 Gradient descent

2.2 GBM的超參數(shù)

（1）boosting hyperparameters

（2）tree hyperparameters

超參數(shù)調(diào)整策略

3、代碼實(shí)操

示例數(shù)據(jù)：預(yù)測(cè)房?jī)r(jià)

step1 : 先大致探索一下

step2：學(xué)習(xí)率指標(biāo)優(yōu)化

step3：優(yōu)化決策樹參數(shù)

step4：確定最佳模型，測(cè)試集評(píng)價(jià)

評(píng)價(jià)特征變量的重要性

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

1、bagging與boosting的區(qū)別

2、GBM，Gradient boosting machines簡(jiǎn)單理解

3、代碼實(shí)操

step4：確定最佳模型，測(cè)試集評(píng)價(jià)