機(jī)器學(xué)習(xí),模式識(shí)別中很重要的一環(huán),就是分類,因?yàn)橛?jì)算機(jī)其實(shí)無法深層次地理解文字圖片目標(biāo)的意思,只能回答是或者不是。當(dāng)然現(xiàn)在卷積神經(jīng)網(wǎng)絡(luò)正在希望計(jì)算機(jī)能夠看懂東西,本文關(guān)注Logistic回歸算法。

Logistic Regression別看它名字里帶了回歸,但是它其實(shí)是一種分類的方法,用于兩分類的問題。
基本原理過程
找一個(gè)合適的預(yù)測函數(shù)(Andrew Ng的公開課中稱為hypothesis),一般表示為h函數(shù),該函數(shù)就是我們需要找的分類函數(shù),它用來預(yù)測輸入數(shù)據(jù)的判斷結(jié)果。這個(gè)過程時(shí)非常關(guān)鍵的,需要對(duì)數(shù)據(jù)有一定的了解或分析,知道或者猜測預(yù)測函數(shù)的“大概”形式,比如是線性函數(shù)還是非線性函數(shù)。
構(gòu)造一個(gè)Cost函數(shù)(損失函數(shù)),該函數(shù)表示預(yù)測的輸出(h)與訓(xùn)練數(shù)據(jù)類別(y)之間的偏差,可以是二者之間的差(h-y)或者是其他的形式。綜合考慮所有訓(xùn)練數(shù)據(jù)的“損失”,將Cost求和或者求平均,記為J(θ)函數(shù),表示所有訓(xùn)練數(shù)據(jù)預(yù)測值與實(shí)際類別的偏差。
顯然,J(θ)函數(shù)的值越小表示預(yù)測函數(shù)越準(zhǔn)確(即h函數(shù)越準(zhǔn)確),所以這一步需要做的是找到J(θ)函數(shù)的最小值。找函數(shù)的最小值有不同的方法,Logistic Regression實(shí)現(xiàn)時(shí)有的是梯度下降法(Gradient Descent)。
邏輯回歸算法的優(yōu)點(diǎn):
- 實(shí)現(xiàn)簡單;
- 分類時(shí)計(jì)算量非常小,速度很快,存儲(chǔ)資源低;
邏輯回歸算法的缺點(diǎn):
- 容易欠擬合,一般準(zhǔn)確度不太高;
- 只能處理兩分類問題(在此基礎(chǔ)上衍生出來的softmax可以用于多分類),且必須線性可分;
過程詳解
既然這種分類方法就叫Logistic Regression,那么Logistic肯定是一個(gè)重要的東西,沒錯(cuò)有個(gè)函數(shù)就叫Logistic函數(shù)(也叫Sigmoid函數(shù),S型函數(shù))

其導(dǎo)數(shù)形式為:

Logistic回歸方法是用最大似然估計(jì)來進(jìn)行學(xué)習(xí)的,單個(gè)樣本的后驗(yàn)概率是 :

那么整個(gè)樣本的后驗(yàn)概率,似然函數(shù)就是:

通過求對(duì)數(shù),得到損失函數(shù):

使用梯度下降發(fā)對(duì)損失函數(shù)求最小值,對(duì)l(θ)求微分,可得:

以下是用Python(2.7)使用sklearn完成的鳶尾花分類代碼。
#!/usr/bin/python
# -*- coding:utf-8 -*-
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.pipeline import Pipeline
import matplotlib.pyplot as plt
import matplotlib as mpl
import matplotlib.patches as mpatches
if __name__ == "__main__":
path = 'iris.data' # 數(shù)據(jù)文件路徑
data = pd.read_csv(path, header=None)
x, y = np.split(data.values, (4,), axis=1)
le = preprocessing.LabelEncoder()
le.fit(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'])
print le.classes_
y = le.transform(y)
# 僅使用前兩列特征
x = x[:, :2]
lr = Pipeline([('sc', StandardScaler()),
('poly', PolynomialFeatures(degree=1)),
('clf', LogisticRegression()) ])
lr.fit(x, y.ravel())
y_hat = lr.predict(x)
y_hat_prob = lr.predict_proba(x)
np.set_printoptions(suppress=True)
print 'y_hat = \n', y_hat
print 'y_hat_prob = \n', y_hat_prob
print u'準(zhǔn)確度:%.2f%%' % (100*np.mean(y_hat == y.ravel()))
# 畫圖
N, M = 500, 500 # 橫縱各采樣多少個(gè)值
x1_min, x1_max = x[:, 0].min(), x[:, 0].max() # 第0列的范圍
x2_min, x2_max = x[:, 1].min(), x[:, 1].max() # 第1列的范圍
t1 = np.linspace(x1_min, x1_max, N)
t2 = np.linspace(x2_min, x2_max, M)
x1, x2 = np.meshgrid(t1, t2) # 生成網(wǎng)格采樣點(diǎn)
x_test = np.stack((x1.flat, x2.flat), axis=1) # 測試點(diǎn)
mpl.rcParams['font.sans-serif'] = [u'simHei']
mpl.rcParams['axes.unicode_minus'] = False
cm_light = mpl.colors.ListedColormap(['#77E0A0', '#FF8080', '#A0A0FF'])
cm_dark = mpl.colors.ListedColormap(['g', 'r', 'b'])
y_hat = lr.predict(x_test) # 預(yù)測值
y_hat = y_hat.reshape(x1.shape) # 使之與輸入的形狀相同
plt.figure(facecolor='w')
plt.pcolormesh(x1, x2, y_hat, cmap=cm_light) # 預(yù)測值的顯示
plt.scatter(x[:, 0], x[:, 1], c=y, edgecolors='k', s=50, cmap=cm_dark) # 樣本的顯示
plt.xlabel(u'花萼長度', fontsize=14)
plt.ylabel(u'花萼寬度', fontsize=14)
plt.xlim(x1_min, x1_max)
plt.ylim(x2_min, x2_max)
plt.grid()
patchs = [mpatches.Patch(color='#77E0A0', label='Iris-setosa'),
mpatches.Patch(color='#FF8080', label='Iris-versicolor'),
mpatches.Patch(color='#A0A0FF', label='Iris-virginica')]
plt.legend(handles=patchs, fancybox=True, framealpha=0.8)
plt.title(u'鳶尾花Logistic回歸分類效果 - 標(biāo)準(zhǔn)化', fontsize=17)
plt.show()
其準(zhǔn)確率為80.00%
附:鳶尾花訓(xùn)練數(shù)據(jù)集
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
5.4,3.7,1.5,0.2,Iris-setosa
4.8,3.4,1.6,0.2,Iris-setosa
4.8,3.0,1.4,0.1,Iris-setosa
4.3,3.0,1.1,0.1,Iris-setosa
5.8,4.0,1.2,0.2,Iris-setosa
5.7,4.4,1.5,0.4,Iris-setosa
5.4,3.9,1.3,0.4,Iris-setosa
5.1,3.5,1.4,0.3,Iris-setosa
5.7,3.8,1.7,0.3,Iris-setosa
5.1,3.8,1.5,0.3,Iris-setosa
5.4,3.4,1.7,0.2,Iris-setosa
5.1,3.7,1.5,0.4,Iris-setosa
4.6,3.6,1.0,0.2,Iris-setosa
5.1,3.3,1.7,0.5,Iris-setosa
4.8,3.4,1.9,0.2,Iris-setosa
5.0,3.0,1.6,0.2,Iris-setosa
5.0,3.4,1.6,0.4,Iris-setosa
5.2,3.5,1.5,0.2,Iris-setosa
5.2,3.4,1.4,0.2,Iris-setosa
4.7,3.2,1.6,0.2,Iris-setosa
4.8,3.1,1.6,0.2,Iris-setosa
5.4,3.4,1.5,0.4,Iris-setosa
5.2,4.1,1.5,0.1,Iris-setosa
5.5,4.2,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
5.0,3.2,1.2,0.2,Iris-setosa
5.5,3.5,1.3,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
4.4,3.0,1.3,0.2,Iris-setosa
5.1,3.4,1.5,0.2,Iris-setosa
5.0,3.5,1.3,0.3,Iris-setosa
4.5,2.3,1.3,0.3,Iris-setosa
4.4,3.2,1.3,0.2,Iris-setosa
5.0,3.5,1.6,0.6,Iris-setosa
5.1,3.8,1.9,0.4,Iris-setosa
4.8,3.0,1.4,0.3,Iris-setosa
5.1,3.8,1.6,0.2,Iris-setosa
4.6,3.2,1.4,0.2,Iris-setosa
5.3,3.7,1.5,0.2,Iris-setosa
5.0,3.3,1.4,0.2,Iris-setosa
7.0,3.2,4.7,1.4,Iris-versicolor
6.4,3.2,4.5,1.5,Iris-versicolor
6.9,3.1,4.9,1.5,Iris-versicolor
5.5,2.3,4.0,1.3,Iris-versicolor
6.5,2.8,4.6,1.5,Iris-versicolor
5.7,2.8,4.5,1.3,Iris-versicolor
6.3,3.3,4.7,1.6,Iris-versicolor
4.9,2.4,3.3,1.0,Iris-versicolor
6.6,2.9,4.6,1.3,Iris-versicolor
5.2,2.7,3.9,1.4,Iris-versicolor
5.0,2.0,3.5,1.0,Iris-versicolor
5.9,3.0,4.2,1.5,Iris-versicolor
6.0,2.2,4.0,1.0,Iris-versicolor
6.1,2.9,4.7,1.4,Iris-versicolor
5.6,2.9,3.6,1.3,Iris-versicolor
6.7,3.1,4.4,1.4,Iris-versicolor
5.6,3.0,4.5,1.5,Iris-versicolor
5.8,2.7,4.1,1.0,Iris-versicolor
6.2,2.2,4.5,1.5,Iris-versicolor
5.6,2.5,3.9,1.1,Iris-versicolor
5.9,3.2,4.8,1.8,Iris-versicolor
6.1,2.8,4.0,1.3,Iris-versicolor
6.3,2.5,4.9,1.5,Iris-versicolor
6.1,2.8,4.7,1.2,Iris-versicolor
6.4,2.9,4.3,1.3,Iris-versicolor
6.6,3.0,4.4,1.4,Iris-versicolor
6.8,2.8,4.8,1.4,Iris-versicolor
6.7,3.0,5.0,1.7,Iris-versicolor
6.0,2.9,4.5,1.5,Iris-versicolor
5.7,2.6,3.5,1.0,Iris-versicolor
5.5,2.4,3.8,1.1,Iris-versicolor
5.5,2.4,3.7,1.0,Iris-versicolor
5.8,2.7,3.9,1.2,Iris-versicolor
6.0,2.7,5.1,1.6,Iris-versicolor
5.4,3.0,4.5,1.5,Iris-versicolor
6.0,3.4,4.5,1.6,Iris-versicolor
6.7,3.1,4.7,1.5,Iris-versicolor
6.3,2.3,4.4,1.3,Iris-versicolor
5.6,3.0,4.1,1.3,Iris-versicolor
5.5,2.5,4.0,1.3,Iris-versicolor
5.5,2.6,4.4,1.2,Iris-versicolor
6.1,3.0,4.6,1.4,Iris-versicolor
5.8,2.6,4.0,1.2,Iris-versicolor
5.0,2.3,3.3,1.0,Iris-versicolor
5.6,2.7,4.2,1.3,Iris-versicolor
5.7,3.0,4.2,1.2,Iris-versicolor
5.7,2.9,4.2,1.3,Iris-versicolor
6.2,2.9,4.3,1.3,Iris-versicolor
5.1,2.5,3.0,1.1,Iris-versicolor
5.7,2.8,4.1,1.3,Iris-versicolor
6.3,3.3,6.0,2.5,Iris-virginica
5.8,2.7,5.1,1.9,Iris-virginica
7.1,3.0,5.9,2.1,Iris-virginica
6.3,2.9,5.6,1.8,Iris-virginica
6.5,3.0,5.8,2.2,Iris-virginica
7.6,3.0,6.6,2.1,Iris-virginica
4.9,2.5,4.5,1.7,Iris-virginica
7.3,2.9,6.3,1.8,Iris-virginica
6.7,2.5,5.8,1.8,Iris-virginica
7.2,3.6,6.1,2.5,Iris-virginica
6.5,3.2,5.1,2.0,Iris-virginica
6.4,2.7,5.3,1.9,Iris-virginica
6.8,3.0,5.5,2.1,Iris-virginica
5.7,2.5,5.0,2.0,Iris-virginica
5.8,2.8,5.1,2.4,Iris-virginica
6.4,3.2,5.3,2.3,Iris-virginica
6.5,3.0,5.5,1.8,Iris-virginica
7.7,3.8,6.7,2.2,Iris-virginica
7.7,2.6,6.9,2.3,Iris-virginica
6.0,2.2,5.0,1.5,Iris-virginica
6.9,3.2,5.7,2.3,Iris-virginica
5.6,2.8,4.9,2.0,Iris-virginica
7.7,2.8,6.7,2.0,Iris-virginica
6.3,2.7,4.9,1.8,Iris-virginica
6.7,3.3,5.7,2.1,Iris-virginica
7.2,3.2,6.0,1.8,Iris-virginica
6.2,2.8,4.8,1.8,Iris-virginica
6.1,3.0,4.9,1.8,Iris-virginica
6.4,2.8,5.6,2.1,Iris-virginica
7.2,3.0,5.8,1.6,Iris-virginica
7.4,2.8,6.1,1.9,Iris-virginica
7.9,3.8,6.4,2.0,Iris-virginica
6.4,2.8,5.6,2.2,Iris-virginica
6.3,2.8,5.1,1.5,Iris-virginica
6.1,2.6,5.6,1.4,Iris-virginica
7.7,3.0,6.1,2.3,Iris-virginica
6.3,3.4,5.6,2.4,Iris-virginica
6.4,3.1,5.5,1.8,Iris-virginica
6.0,3.0,4.8,1.8,Iris-virginica
6.9,3.1,5.4,2.1,Iris-virginica
6.7,3.1,5.6,2.4,Iris-virginica
6.9,3.1,5.1,2.3,Iris-virginica
5.8,2.7,5.1,1.9,Iris-virginica
6.8,3.2,5.9,2.3,Iris-virginica
6.7,3.3,5.7,2.5,Iris-virginica
6.7,3.0,5.2,2.3,Iris-virginica
6.3,2.5,5.0,1.9,Iris-virginica
6.5,3.0,5.2,2.0,Iris-virginica
6.2,3.4,5.4,2.3,Iris-virginica
5.9,3.0,5.1,1.8,Iris-virginica