1. 集成算法

1.1 集成算法是通過在數(shù)據(jù)上構(gòu)建多個(gè)模型，集成所有模型的建模結(jié)果，包括隨機(jī)森林，梯度提升樹（GBDT），Xgboost等。
1.2 多個(gè)模型集成成為的模型叫做集成評(píng)估器（ensemble estimator），組成集成評(píng)估器的每個(gè)模型都叫做基評(píng)估器（base estimator）。通常來說，有三類集成算法：裝袋法（Bagging），提升法（Boosting）和stacking。
1.3 裝袋法的核心思想是構(gòu)建多個(gè)相互獨(dú)立的評(píng)估器，裝袋法的代表模型就是隨機(jī)森林。
1.4 提升法的核心思想是結(jié)合弱評(píng)估器的力量一次次對(duì)難以評(píng)估的樣本
進(jìn)行預(yù)測(cè)，從而構(gòu)成一個(gè)強(qiáng)評(píng)估器。提升法的代表模型有Adaboost和梯度提升樹。

sklearn中的集成算法模塊ensemble

2. RandomForestClassifier 隨機(jī)森林分類

隨機(jī)森林是非常具有代表性的Bagging集成算法，它的所有基評(píng)估器都是決策樹，分類樹組成的森林就叫做隨機(jī)森林分類器，回歸樹所集成的森林就叫做隨機(jī)森林回歸器。

2.1 重要參數(shù)（n_estimators，random_state，boostrap和oob_score）

1. n_estimators

這是森林中樹木的數(shù)量，即基評(píng)估器的數(shù)量。n_estimators越大，模型的效果往往越好。
一個(gè)隨機(jī)森林和單個(gè)決策樹效益的對(duì)比
1. 導(dǎo)入包

#1.導(dǎo)入包
#代表畫圖的時(shí)候，需要這個(gè)環(huán)境
%matplotlib inline 
from sklearn.tree import DecisionTreeClassifier #決策樹
from sklearn.ensemble import RandomForestClassifier #集成學(xué)習(xí)中的隨機(jī)森林

2. 導(dǎo)入數(shù)據(jù)集

#2 導(dǎo)入數(shù)據(jù)集
wine = load_wine()
wine.data
wine.target

3. sklearn建模的基本流程
1.實(shí)例化
2.訓(xùn)練集帶入實(shí)例化后的模型進(jìn)行訓(xùn)練，使用的接口是fit
3.使用其它接口將測(cè)試集導(dǎo)入我們訓(xùn)練好的模型，去獲取我們希望獲取的結(jié)果（score,y_test）

from sklearn.model_selection import train_test_split
Xtrain,Xtest,Ytrain,Ytest = train_test_split(wine.data,wine.target,test_size=0.3)

#復(fù)習(xí):sklearn建模的基本流程
clf = DecisionTreeClassifier(random_state=0)
rfc = RandomForestClassifier(random_state=0)

clf = clf.fit(Xtrain,Ytrain)
rfc = rfc.fit(Xtrain,Ytrain)

score_c = clf.score(Xtest,Ytest) #是精確度
score_r = rfc.score(Xtest,Ytest)

print('Single Tree:{}'.format(score_c)
     ,'Random Forest:{}'.format(score_r)) #format是將分?jǐn)?shù)轉(zhuǎn)換放在{}中

4. 畫出隨機(jī)森林和決策樹在一組交叉驗(yàn)證下的效果對(duì)比

#4. 畫出隨機(jī)森林和決策樹在一組交叉驗(yàn)證下的效果對(duì)比
#交叉驗(yàn)證：是數(shù)據(jù)集劃分為n分，依次取每一份做測(cè)試集，每n-1份做訓(xùn)練集，多次訓(xùn)練模型以觀測(cè)模型穩(wěn)定性的方法

from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt

rfc = RandomForestClassifier(n_estimators=25)
rfc_s = cross_val_score(rfc,wine.data,wine.target,cv=10)
clf = DecisionTreeClassifier()
clf_s = cross_val_score(clf,wine.data,wine.target,cv=10)

plt.plot(range(1,11),rfc_s,label = "RandomForest")
plt.plot(range(1,11),clf_s,label = "Decision Tree")
plt.legend()
plt.show()

可以看到隨機(jī)森林的準(zhǔn)確率高于決策樹

#上述交叉驗(yàn)證更為簡(jiǎn)單的實(shí)現(xiàn)方式
# for循環(huán)兩種交叉驗(yàn)證，先計(jì)算RandomForest，再計(jì)算DecisionTree
label = "RandomForest"
for model in [RandomForestClassifier(n_estimators=25),DecisionTreeClassifier()]:
    score = cross_val_score(model,wine.data,wine.target,cv=10)
    print("{}:".format(label)),print(score.mean()) #這邊打印的是計(jì)算10次得到的acuraccy的平均值
    plt.plot(range(1,11),score,label = label)
    plt.legend()
    label = "DecisionTree"

5. 畫出隨機(jī)森林和決策樹在十組交叉驗(yàn)證下的效果對(duì)比

# 5. 畫出隨機(jī)森林和決策樹在十組交叉驗(yàn)證下的效果對(duì)比
rfc_l = []
clf_l = []
for i in range(10):
    rfc = RandomForestClassifier(n_estimators=25)
    rfc_s = cross_val_score(rfc,wine.data,wine.target,cv=10).mean()
    rfc_l.append(rfc_s)
    clf = DecisionTreeClassifier()
    clf_s = cross_val_score(clf,wine.data,wine.target,cv=10).mean()
    clf_l.append(clf_s)

plt.plot(range(1,11),rfc_l,label = "Random Forest")
plt.plot(range(1,11),clf_l,label = "Decision Tree")
plt.legend()
plt.show()

單個(gè)決策樹的準(zhǔn)確率越高，隨機(jī)森林的準(zhǔn)確率也會(huì)越高

6. n_estimators的學(xué)習(xí)曲線

#6. n_estimators的學(xué)習(xí)曲線
superpa = []
for i in range(200):
    rfc = RandomForestClassifier(n_estimators=i+1,n_jobs=-1) #這里就是進(jìn)行了200次的隨機(jī)森林計(jì)算，每次的n_estimator設(shè)置不一樣
    rfc_s = cross_val_score(rfc,wine.data,wine.target,cv=10).mean()
    superpa.append(rfc_s)
print(max(superpa),superpa.index(max(superpa)))
plt.figure(figsize=[20,5])
plt.plot(range(1,201),superpa)
plt.show()
# list.index(object) >>>返回對(duì)象object在列表list中的索引 68是i值，但是n_estimators=i+1，所以最大準(zhǔn)確率對(duì)應(yīng)的n_estimators是69.

n_estimator在達(dá)到一定值后，準(zhǔn)確率就會(huì)在一定范圍波動(dòng)，并不會(huì)一直上升而趨向于1。

2. random_state

在決策樹中，一個(gè)random_state只控制生成一棵樹，而隨機(jī)森林中的random_state控制的是生成森林的模式。當(dāng)random_state固定時(shí)，隨機(jī)森林中生成是一組固定的樹，但每棵樹依然是不一致的。

rfc = RandomForestClassifier(n_estimators=25,random_state=2)
rfc = rfc.fit(Xtrain,Ytrain)
# #隨機(jī)森林的重要屬性之一：estimators，查看森林中樹的狀況
rfc.estimators_[0].random_state #就是查看第0棵樹的randomstate是多少
#1872583848

#通過循環(huán)將隨機(jī)森林中所有決策樹的random_state導(dǎo)出
for i in range(len(rfc.estimators_)):
    print(rfc.estimators_[i].random_state)

3. bootstrap & oob_score

袋裝法正是通過有放回的隨機(jī)抽樣技術(shù)來形成不同的訓(xùn)練數(shù)據(jù)，bootstrap就是用來控制抽樣技術(shù)的參數(shù)。bootstrap參數(shù)默認(rèn)True，代表采用這種有放回的隨機(jī)抽樣技術(shù)。
這種抽樣方法會(huì)有約37%的訓(xùn)練數(shù)據(jù)被浪費(fèi)掉，沒有參與建模，這些數(shù)據(jù)被稱為袋外數(shù)據(jù)(out of bag data，簡(jiǎn)寫為oob)。在使用隨機(jī)森林時(shí)，我們可以不劃分測(cè)試集和訓(xùn)練集，只需要用袋外數(shù)據(jù)來測(cè)試我們的模型即可。oob_score_來查看我們的在袋外數(shù)據(jù)上測(cè)試的結(jié)果。

#無需劃分訓(xùn)練集和測(cè)試集，用袋外數(shù)據(jù)來測(cè)試模型
rfc = RandomForestClassifier(n_estimators=25,oob_score=True) #oob_score默認(rèn)是FALSE,bootstrap默認(rèn)是TRUE.
rfc = rfc.fit(wine.data,wine.target) #用所有的數(shù)據(jù)來訓(xùn)練
#重要屬性oob_score_
rfc.oob_score_  #查看袋外數(shù)據(jù)在模型上的測(cè)試結(jié)果
#0.9606741573033708

2.2 重要屬性和接口

重要屬性：.estimators_ .oob_score_ 和.feature_importances_

接口：apply, fit, predict, score和predict_proba

rfc = RandomForestClassifier(n_estimators=25)
rfc = rfc.fit(Xtrain, Ytrain) #fit接口是訓(xùn)練集用的
rfc.score(Xtest,Ytest)
rfc.feature_importances_ #得出所有特征的重要性數(shù)值
rfc.apply(Xtest) #返回測(cè)試集每個(gè)樣本在所在樹的葉子節(jié)點(diǎn)的索引
rfc.predict(Xtest) #返回對(duì)測(cè)試集的預(yù)測(cè)標(biāo)簽
rfc.predict_proba(Xtest) #每一個(gè)樣本分配到每一個(gè)標(biāo)簽的概率

參數(shù)是用來實(shí)例化確定模型有哪些限制條件的，屬性是模型訓(xùn)練集的一些信息；fit接口是用于訓(xùn)練集的，剩下的接口基本都是用于測(cè)試集的。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

機(jī)器學(xué)習(xí)：04. 隨機(jī)森林之RandomForestClassifier

機(jī)器學(xué)習(xí)：04. 隨機(jī)森林之RandomForestClassifier

1. 集成算法

2. RandomForestClassifier 隨機(jī)森林分類

2.1 重要參數(shù)（n_estimators，random_state，boostrap和oob_score）

1. n_estimators

2. random_state

3. bootstrap & oob_score

2.2 重要屬性和接口

重要屬性：.estimators_ .oob_score_ 和.feature_importances_

接口：apply, fit, predict, score和predict_proba

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

機(jī)器學(xué)習(xí)：04. 隨機(jī)森林之RandomForestClassifier

1. 集成算法

2. RandomForestClassifier 隨機(jī)森林分類

2.1 重要參數(shù)（n_estimators，random_state，boostrap和oob_score）

1. n_estimators

2. random_state

3. bootstrap & oob_score

2.2 重要屬性和接口

重要屬性：.estimators_ .oob_score_ 和.feature_importances_

接口：apply, fit, predict, score和predict_proba

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

2.1 重要參數(shù)（n_estimators，random_state，boostrap和oob_score）