from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
X, y = make_classification(n_samples=1000, n_features=4)
print(X) #np.array類型,1000*3
print(y) #np.array類型,1000*1
lr = LogisticRegression()
X_train = X[:-200]
X_test = X[-200:]
y_train = y[:-200]
y_test = y[-200:]
lr.fit(X_train, y_train)
y_train_predictions = lr.predict(X_train)
print(type(y_train_predictions))
y_test_predictions = lr.predict(X_test)
print ((y_train_predictions == y_train).sum().astype(float) / y_train.shape[0])
print ((y_test_predictions == y_test).sum().astype(float) / y_test.shape[0])
LogisticRegression()中的可加入參數較多,包含有:
(1)penalty:正則化項,l2正則化的目的是為防止過擬合,其內容為各權重的平方和加權
(2)C:目標函數的系數;因此C越大時,表示正則化的能力越弱
(3)tol:迭代停止值
(4)solver:求梯度的方法,默認選擇‘liblinear’---線性分類器。其可選參數類型包含{‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’}, default: ‘liblinear’.
根據API文檔,各參數的優(yōu)勢為:
For small datasets, ‘liblinear’ is a good choice, whereas ‘sag’ and ‘saga’ are faster for large ones.
For multiclass problems, only ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ handle multinomial loss; ‘liblinear’ is limited to one-versus-rest schemes.
‘newton-cg’, ‘lbfgs’ and ‘sag’ only handle L2 penalty, whereas ‘liblinear’ and ‘saga’ handle L1 penalty.
Note that ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.
(5)dual:是否采用對偶方式進行求解;dual=true表示對偶方式,primal為原問題方式。
算法語言描述

以上算法描述部分若有誤,敬請留言~