實例數(shù)據(jù)操作02


今天看數(shù)據(jù)預(yù)處理,其實預(yù)處理和不處理,對結(jié)果的得分有很大的影響,最好是先比較兩者的差異,再決定要不要用,預(yù)處理一般包括


scaler.fit(X_train)

X_train_scaled = scaler.transform(X_train)

三個步驟:1導(dǎo)入相關(guān)的預(yù)處理模塊,并初始化,

2? 匹配要處理的數(shù)據(jù)(一般都是因變量 測試的和訓(xùn)練的)

3? 轉(zhuǎn)換匹配處理后的結(jié)果

scaler = Min Max Scaler()

scaler.fit(X_train)

X_train_scaled = scaler.transform(X_train)

X_test_scaled = scaler.transform(X_test)

這個可以將兩部合為一體:????? X_scaled_d = scaler.fit_transform(X)

但臥槽



還有一種常見的:

##preprocessing using zero mean and unit variance scaling

from sklearn.preprocessing import StandardScaler




Principal Component Analysis (PCA)




Original shape: (569, 30)

Reduced shape: (569, 2)




擦,,看不懂打


from sklearn.cluster import KMeans


from sklearn.datasets import make_blobs

from sklearn.cluster import KMeans

# generate synthetic two-dimensional data

X, y = make_blobs(random_state=1)

# build the clustering model

kmeans = KMeans(n_clusters=3)

kmeans.fit(X)



data_dummies = pd.get_dummies(data)? 生成啞變量

數(shù)字進(jìn)行編碼

demo_df = pd.Data Frame({'Integer Feature': [0, 1, 2, 1],

'Categorical Feature': ['socks', 'fox', 'socks', 'box']})














模型檢測和提高

k-fold cross-validation, 最常用的交叉驗證


最常用的函數(shù)是cross_val_score(), 第一個參數(shù)是選擇的模型,第二個是因變量,第三個是輸出值,默認(rèn)是三重交叉驗證,可以改變重數(shù)

A common way to summarize the cross-validation accuracy is to compute the mean:,最常用的是輸出其均值

print("Average cross-validation score: {:.2f}".format(scores.mean()))




from sklearn.model_selection import Grid Search CV

from sklearn.svm import SVC

grid_search = Grid Search CV(SVC(), param_grid, cv=5)

X_train, X_test, y_train, y_test = train_test_split(

iris.data, iris.target, random_state=0)

grid_search.fit(X_train, y_train)

print("Test set score: {:.2f}".format(grid_search.score(X_test, y_test)))

Test set score: 0.97

print("Best parameters: {}".format(grid_search.best_params_))

print("Best cross-validation score: {:.2f}".format(grid_search.best_score_))












Precision-recall curves and ROC curves:

from sklearn.metrics import precision_recall_curve

precision, recall, thresholds = precision_recall_curve(

y_test, svc.decision_function(X_test))

Receiver operating characteristics (ROC) and AUC




















最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容