99黄色网址,91大概香蕉伊人,91色肥佬

一、數(shù)據(jù)集

Kaggle泰坦尼克數(shù)據(jù)集train.csv

二、模型選擇

泰坦尼克數(shù)據(jù)集是二分類模型，本文選擇使用隨機森林模型進行調(diào)參。

三、數(shù)據(jù)預處理

泰坦尼克數(shù)據(jù)集需要進行數(shù)據(jù)預處理才能后續(xù)建模導入，刪除了列Name、Ticket、Cabin，對列Sex、Embarked進行編碼，使用平均值填補列Age缺失樣本，分離出特征集與標簽集。

四、調(diào)參流程

1）簡單建模，觀察模型在數(shù)據(jù)集上具體的表現(xiàn)效果
2）調(diào)參——n_estimators
3）調(diào)參——max_depth
4）調(diào)參——min_samples_leaf
5）調(diào)參——min_samples_split
6）調(diào)參——max_features
7）調(diào)參——criterion
8）確定最佳參數(shù)組合

五、調(diào)參詳解應用步驟

1）導入相關(guān)庫

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
from sklearn.impute import SimpleImputer
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

2）查看數(shù)據(jù)集概況

data=pd.read_csv("C:\\Users\\DRF\\Desktop\\tatanic\\datasets\\train.csv",index_col=0)
data.head()
data.info()

發(fā)現(xiàn)數(shù)據(jù)存在缺失值等問題，需要進行數(shù)據(jù)預處理后才能建模

3）數(shù)據(jù)預處理

data.loc[:,'Age']=SimpleImputer(missing_values=np.nan,strategy='mean').fit_transform(data.loc[:,'Age'].values.reshape(-1,1))
data.drop(['Name','Cabin','Ticket'],axis=1,inplace=True)
data.loc[:,'Sex']=(data.loc[:,'Sex']=='male').astype('int32')
data=data.dropna()
labels=data.loc[:,'Embarked'].unique().tolist()
data['Embarked']=data['Embarked'].apply(lambda x:labels.index(x))
x=data.iloc[:,data.columns!='Survived']
y=data.iloc[:,data.columns=='Survived']
y=y.values.ravel()

這樣就預處理成功泰坦尼克數(shù)據(jù)集，并區(qū)分好特征x和標簽y，可進行下一步的建模。

4）簡單建模，觀察模型在數(shù)據(jù)集上具體的表現(xiàn)效果

rfc=RandomForestClassifier(n_estimators=100,random_state=90)
score_pre=cross_val_score(rfc,x,y,cv=10).mean()
score_pre

score_pre 分數(shù)為 0.809920837589377

5）調(diào)參 n_estimators

scorel=[]
for i in range(1,201,10):
    rfc=RandomForestClassifier(n_estimators=i,random_state=90)
    score=cross_val_score(rfc,x,y,cv=10).mean()
    scorel.append(score)
print(max(scorel),((scorel.index(max(scorel))*10))+1)
plt.figure(figsize=[20,5])
plt.plot(range(1,201,10),scorel)
plt.show()

運行結(jié)果：

通過數(shù)據(jù)和學習曲線可以發(fā)現(xiàn)，當n_estimators=71的時候，階段性準確率最高，達到0.8121807967313586，調(diào)整n_estimators效果顯著，準確率較之前有提升。

接下來縮小范圍，繼續(xù)探索n_estimators在 [65,75] 之間的表現(xiàn)效果

scorel=[]
for i in range(65,75):
    rfc=RandomForestClassifier(n_estimators=i,random_state=90)
    score=cross_val_score(rfc,x,y,cv=10).mean()
    scorel.append(score)
print(max(scorel),[*range(65,75)][scorel.index(max(scorel))])
plt.figure(figsize=[20,5])
plt.plot(range(65,75),scorel)
plt.show()

運行結(jié)果：

縮小范圍后，正好也是當n_estimators=71時，模型準確度為0.8121807967313586。確定最佳 n_estimators 為 71，接下來就進入網(wǎng)格搜索，我們將使用網(wǎng)格搜索對參數(shù)一個個進行調(diào)整。窺探如何通過復雜度-泛化誤差方法調(diào)整參數(shù)進而提高模型的準確度。

6）調(diào)參max_depth

grid_param={'max_depth':[*np.arange(1,7)]}

rfc=RandomForestClassifier(n_estimators=71,random_state=90)
GS=GridSearchCV(rfc,grid_param,cv=10)
GS.fit(x,y)
GS.best_params_

運行結(jié)果：

通過運行結(jié)果可以看到，網(wǎng)格搜索給出的最佳參數(shù)max_depth是5，此時最佳準確度為0.8346456692913385

限制max_depth減小至5，模型準確率有所提升，說明模型現(xiàn)在位于圖像右邊，即泛化誤差最低點的右邊。最終確定參數(shù)max_depth=5。

7）調(diào)參min_samples_leaf

grid_param={'min_samples_leaf':[*np.arange(1,11,1)]}

rfc=RandomForestClassifier(n_estimators=71,random_state=90)
GS=GridSearchCV(rfc,grid_param,cv=10)
GS.fit(x,y)
GS.best_params_
GS.best_score_

運行結(jié)果：

當min_samples_leaf=3時，準確率最高為0.8335208098987626，較之前的0.8346456692913385小，說明模型準確度下降了，位于圖像左邊，即泛化誤差最低點的左邊。舍棄調(diào)整參數(shù)min_samples_leaf。

8）調(diào)參min_samples_split

grid_param={'min_samples_split':[*np.arange(2,22,1)]}

rfc=RandomForestClassifier(n_estimators=71,random_state=90)
GS=GridSearchCV(rfc,grid_param,cv=10)
GS.fit(x,y)
GS.best_params_
GS.best_score_

運行結(jié)果：

當min_samples_split=17時，準確率最高為.8357705286839145，較之前的0.8346456692913385有所提高，說明模型準確度隨著模型復雜度的下降而降低了，表明模型現(xiàn)在位于圖像右邊，即泛化誤差最低點的右邊。確認參數(shù)min_samples_split=17。

9）調(diào)參max_features

grid_param={'max_features':[*np.arange(2,7,1)]}

rfc=RandomForestClassifier(n_estimators=71,min_samples_split=17,random_state=90)
GS=GridSearchCV(rfc,grid_param,cv=10)
GS.fit(x,y)
GS.best_params_
GS.best_score_

運行結(jié)果：

網(wǎng)格搜索給出的最佳參數(shù)max_features是2，此時準確度與之前相同。而max_features的默認值是特征數(shù)的開平方即為2，因此模型的準確率沒有變化。最終確認參數(shù)max_features=2。

10）調(diào)參criterion

grid_param={'criterion':['gini','entropy']}

rfc=RandomForestClassifier(n_estimators=71,min_samples_split=17,max_features=2,random_state=90)
GS=GridSearchCV(rfc,grid_param,cv=10)
GS.fit(x,y)
GS.best_params_
GS.best_score_

運行結(jié)果：

網(wǎng)格搜索給出的最佳參數(shù)criterion是gini，此時準確度與之前相同，默認的criterion也是gini，因此模型的準確率沒有變化。

六、調(diào)參完畢，總結(jié)模型最佳參數(shù)組合

RandomForestClassifier(n_estimators=71
                      ,min_samples_split=17
                      ,max_features=2
                      ,criterion='gini'
                      ,random_state=90)

調(diào)參前模型準確率：0.809920837589377（80.99%）
調(diào)參后模型準確率：0.835770528683914（83.58%）
模型提升的準確率：0.025849691094537（+2.58%）

·································································································································································
完整代碼：

#導入相關(guān)庫
from sklearn.ensemble import RandomForestClassifier #導入集成算法隨機森林模塊
from sklearn.model_selection import cross_val_score #導入交叉驗證模塊
from sklearn.model_selection import GridSearchCV    #導入網(wǎng)格搜索模塊
from sklearn.impute import SimpleImputer            #導入SimpleImputer用于填補缺失值
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#數(shù)據(jù)集概況
data=pd.read_csv("C:\\Users\\DRF\\Desktop\\tatanic\\datasets\\train.csv",index_col=0) #導入數(shù)據(jù)集
data.head()
data.info()

#數(shù)據(jù)預處理
data.loc[:,'Age']=SimpleImputer(missing_values=np.nan,strategy='mean').fit_transform(data.loc[:,'Age'].values.reshape(-1,1))
data.drop(['Name','Cabin','Ticket'],axis=1,inplace=True)
data.loc[:,'Sex']=(data.loc[:,'Sex']=='male').astype('int32')
data=data.dropna()
labels=data.loc[:,'Embarked'].unique().tolist()
data['Embarked']=data['Embarked'].apply(lambda x:labels.index(x))
x=data.iloc[:,data.columns!='Survived']
y=data.iloc[:,data.columns=='Survived']
y=y.values.ravel()

#簡單建模，觀察模型在數(shù)據(jù)集上具體的表現(xiàn)效果
rfc=RandomForestClassifier(n_estimators=100,random_state=90)      #實例化
score_pre=cross_val_score(rfc,x,y,cv=10).mean() #交叉驗證
score_pre

#調(diào)參n_estimators
scorel=[]
for i in range(1,201,10):
    rfc=RandomForestClassifier(n_estimators=i,random_state=90) #設(shè)置n_estimators[1,201]依次建模評分
    score=cross_val_score(rfc,x,y,cv=10).mean()
    scorel.append(score)
print(max(scorel),((scorel.index(max(scorel))*10))+1)
plt.figure(figsize=[20,5]) #繪制學習曲線
plt.plot(range(1,201,10),scorel)
plt.show()

scorel=[]
for i in range(65,75):
    rfc=RandomForestClassifier(n_estimators=i,random_state=90) #設(shè)置n_estimators[1,201]依次建模評分
    score=cross_val_score(rfc,x,y,cv=10).mean()
    scorel.append(score)
print(max(scorel),[*range(65,75)][scorel.index(max(scorel))])
plt.figure(figsize=[20,5]) #繪制學習曲線
plt.plot(range(65,75),scorel)
plt.show()

#調(diào)參max_depth 網(wǎng)格搜索最佳參數(shù)
grid_param={'max_depth':[*np.arange(1,7)]} #網(wǎng)格搜索設(shè)置參數(shù)及參數(shù)大小范圍
rfc=RandomForestClassifier(n_estimators=71,random_state=90) #實例化
GS=GridSearchCV(rfc,param_grid,cv=10) #網(wǎng)格搜索
GS.fit(data.data,data.target)  #訓練模型
GS.best_params_   #最佳參數(shù)
GS.best_score_    #最佳分數(shù)

#調(diào)參min_samples_leaf 網(wǎng)格搜索最佳參數(shù)
grid_param={'min_samples_leaf':[*np.arange(1,11,1)]} #網(wǎng)格搜索設(shè)置參數(shù)及參數(shù)大小范圍
rfc=RandomForestClassifier(n_estimators=71,random_state=90) #實例化
GS=GridSearchCV(rfc,param_grid,cv=10) #網(wǎng)格搜索
GS.fit(data.data,data.target)  #訓練模型
GS.best_params_   #最佳參數(shù)
GS.best_score_    #最佳分數(shù)

#調(diào)參min_samples_split 網(wǎng)格搜索最佳參數(shù)
grid_param={'min_samples_split':[*np.arange(2,22,1)]} #網(wǎng)格搜索設(shè)置參數(shù)及參數(shù)大小范圍
rfc=RandomForestClassifier(n_estimators=71,random_state=90) #實例化
GS=GridSearchCV(rfc,param_grid,cv=10) #網(wǎng)格搜索
GS.fit(data.data,data.target)  #訓練模型
GS.best_params_   #最佳參數(shù)
GS.best_score_    #最佳分數(shù)

#調(diào)參max_features 網(wǎng)格搜索最佳參數(shù)
grid_param={'max_features':[*np.arange(2,7,1)]} #網(wǎng)格搜索設(shè)置參數(shù)及參數(shù)大小范圍
rfc=RandomForestClassifier(n_estimators=71,min_samples_split=17,random_state=90) #實例化
GS=GridSearchCV(rfc,grid_param,cv=10) #網(wǎng)格搜索
GS.fit(data.data,data.target)  #訓練模型
GS.best_params_  #最佳參數(shù)
GS.best_score_   #最佳分數(shù)

#調(diào)參criterion 網(wǎng)格搜索最佳參數(shù)
grid_param={'criterion':['gini','entropy']} #網(wǎng)格搜索設(shè)置參數(shù)及參數(shù)大小范圍
rfc=RandomForestClassifier(n_estimators=71,min_samples_split=17,max_features=2,random_state=90) #實例化
GS=GridSearchCV(rfc,grid_param,cv=10) #網(wǎng)格搜索
GS.fit(data.data,data.target)  #訓練模型
GS.best_params_  #最佳參數(shù)
GS.best_score_   #最佳分數(shù)

#確定最佳參數(shù)組合
RandomForestClassifier(n_estimators=71
                      ,min_samples_split=17
                      ,max_features=2
                      ,criterion='gini'
                      ,random_state=90)

以上全部是我對關(guān)于隨機森林算法對泰坦尼克號數(shù)據(jù)集的調(diào)參思路分享。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

模型調(diào)參——隨機森林在泰坦尼克數(shù)據(jù)集上的調(diào)參應用

模型調(diào)參——隨機森林在泰坦尼克數(shù)據(jù)集上的調(diào)參應用

一、數(shù)據(jù)集

二、模型選擇

三、數(shù)據(jù)預處理

四、調(diào)參流程

五、調(diào)參詳解應用步驟

1）導入相關(guān)庫

2）查看數(shù)據(jù)集概況

3）數(shù)據(jù)預處理

4）簡單建模，觀察模型在數(shù)據(jù)集上具體的表現(xiàn)效果

5）調(diào)參 n_estimators

6）調(diào)參max_depth

7）調(diào)參min_samples_leaf

8）調(diào)參min_samples_split

9）調(diào)參max_features

10）調(diào)參criterion

六、調(diào)參完畢，總結(jié)模型最佳參數(shù)組合

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

模型調(diào)參——隨機森林在泰坦尼克數(shù)據(jù)集上的調(diào)參應用

一、數(shù)據(jù)集

二、模型選擇

三、數(shù)據(jù)預處理

四、調(diào)參流程

五、調(diào)參詳解應用步驟

1）導入相關(guān)庫

2）查看數(shù)據(jù)集概況

3）數(shù)據(jù)預處理

4）簡單建模，觀察模型在數(shù)據(jù)集上具體的表現(xiàn)效果

5）調(diào)參 n_estimators

6）調(diào)參max_depth

7）調(diào)參min_samples_leaf

8）調(diào)參min_samples_split

9）調(diào)參max_features

10）調(diào)參criterion

六、調(diào)參完畢，總結(jié)模型最佳參數(shù)組合

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

一、數(shù)據(jù)集

三、數(shù)據(jù)預處理

四、調(diào)參流程

五、調(diào)參詳解應用步驟

六、調(diào)參完畢，總結(jié)模型最佳參數(shù)組合