感知機(jī)算法
《統(tǒng)計(jì)學(xué)習(xí)方法》系列筆記的第二篇,對(duì)應(yīng)原著第二章。大量引用原著講解,加入了自己的理解。對(duì)書中算法采用Python實(shí)現(xiàn),并用Matplotlib可視化了動(dòng)畫出來(lái),應(yīng)該算是很硬派了。一套干貨下來(lái),很是辛苦,要是能堅(jiān)持下去就好。
概念
感知機(jī)是二分類模型,輸入實(shí)例的特征向量,輸出實(shí)例的±類別。
感知機(jī)模型
定義
假設(shè)輸入空間是

x和y分屬這兩個(gè)空間,那么由輸入空間到輸出空間的如下函數(shù):



叫做偏置,w·x表示向量w和x的內(nèi)積。sign是一個(gè)函數(shù):

感知機(jī)的幾何解釋是,線性方程

將特征空間劃分為正負(fù)兩個(gè)部分:

這個(gè)平面(2維時(shí)退化為直線)稱為分離超平面。
感知機(jī)學(xué)習(xí)策略
數(shù)據(jù)集的線性可分性
定義
給定數(shù)據(jù)集


如果存在某個(gè)超平面S

能夠完全正確地將正負(fù)實(shí)例點(diǎn)全部分割開來(lái),則稱T線性可分,否則稱T線性不可分
感知機(jī)學(xué)習(xí)策略
假定數(shù)據(jù)集線性可分,我們希望找到一個(gè)合理的損失函數(shù)。
一個(gè)樸素的想法是采用誤分類點(diǎn)的總數(shù),但是這樣的損失函數(shù)不是參數(shù)w,b的連續(xù)可導(dǎo)函數(shù),不可導(dǎo)自然不能把握函數(shù)的變化,也就不易優(yōu)化(不知道什么時(shí)候該終止訓(xùn)練,或終止的時(shí)機(jī)不是最優(yōu)的)。
另一個(gè)想法是選擇所有誤分類點(diǎn)到超平面S的總距離。為此,先定義點(diǎn)x0到平面S的距離:


是w的L2范數(shù),所謂L2范數(shù),指的是向量各元素的平方和然后求平方根(長(zhǎng)度)。這個(gè)式子很好理解,回憶中學(xué)學(xué)過的點(diǎn)到平面的距離:

此處的點(diǎn)到超平面S的距離的幾何意義就是上述距離在多維空間的推廣。
又因?yàn)?,如果點(diǎn)i被誤分類,一定有

成立,所以我們?nèi)サ袅私^對(duì)值符號(hào),得到誤分類點(diǎn)到超平面S的距離公式:

假設(shè)所有誤分類點(diǎn)構(gòu)成集合M,那么所有誤分類點(diǎn)到超平面S的總距離為

分母作用不大,反正一定是正的,不考慮分母,就得到了感知機(jī)學(xué)習(xí)的損失函數(shù):

感知機(jī)學(xué)習(xí)算法
原始形式
感知機(jī)學(xué)習(xí)算法是對(duì)以下最優(yōu)化問題的算法:

感知機(jī)學(xué)習(xí)算法是誤分類驅(qū)動(dòng)的,先隨機(jī)選取一個(gè)超平面,然后用梯度下降法不斷極小化上述損失函數(shù)。損失函數(shù)的梯度由:

給出。所謂梯度,是一個(gè)向量,指向的是標(biāo)量場(chǎng)增長(zhǎng)最快的方向,長(zhǎng)度是最大變化率。所謂標(biāo)量場(chǎng),指的是空間中任意一個(gè)點(diǎn)的屬性都可以用一個(gè)標(biāo)量表示的場(chǎng)(個(gè)人理解該標(biāo)量為函數(shù)的輸出)。
隨機(jī)選一個(gè)誤分類點(diǎn)i,對(duì)參數(shù)w,b進(jìn)行更新:


是學(xué)習(xí)率。損失函數(shù)的參數(shù)加上梯度上升的反方向,于是就梯度下降了。所以,上述迭代可以使損失函數(shù)不斷減小,直到為0。于是得到了原始形式的感知機(jī)學(xué)習(xí)算法:

對(duì)于此算法,使用下面的例子作為測(cè)試數(shù)據(jù):

給出Python實(shí)現(xiàn)和可視化代碼如下:
終于到了最激動(dòng)人心的時(shí)刻了,有了上述知識(shí),就可以完美地可視化這個(gè)簡(jiǎn)單的算法:
import copy
from matplotlib import pyplot as plt
from matplotlib import animation
training_set = [[(3, 3), 1], [(4, 3), 1], [(1, 1), -1]]
w = [0, 0]
b = 0
history = []
def update(item):
"""
update parameters using stochastic gradient descent
:param item: an item which is classified into wrong class
:return: nothing
"""
global w, b, history
w[0] += 1 * item[1] * item[0][0]
w[1] += 1 * item[1] * item[0][1]
b += 1 * item[1]
print w, b
history.append([copy.copy(w), b])
# you can uncomment this line to check the process of stochastic gradient descent
def cal(item):
"""
calculate the functional distance between 'item' an the dicision surface. output yi(w*xi+b).
:param item:
:return:
"""
res = 0
for i in range(len(item[0])):
res += item[0][i] * w[i]
res += b
res *= item[1]
return res
def check():
"""
check if the hyperplane can classify the examples correctly
:return: true if it can
"""
flag = False
for item in training_set:
if cal(item) <= 0:
flag = True
update(item)
# draw a graph to show the process
if not flag:
print "RESULT: w: " + str(w) + " b: " + str(b)
return flag
if __name__ == "__main__":
for i in range(1000):
if not check(): break
# first set up the figure, the axis, and the plot element we want to animate
fig = plt.figure()
ax = plt.axes(xlim=(0, 2), ylim=(-2, 2))
line, = ax.plot([], [], 'g', lw=2)
label = ax.text([], [], '')
# initialization function: plot the background of each frame
def init():
line.set_data([], [])
x, y, x_, y_ = [], [], [], []
for p in training_set:
if p[1] > 0:
x.append(p[0][0])
y.append(p[0][1])
else:
x_.append(p[0][0])
y_.append(p[0][1])
plt.plot(x, y, 'bo', x_, y_, 'rx')
plt.axis([-6, 6, -6, 6])
plt.grid(True)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Perceptron Algorithm (www.hankcs.com)')
return line, label
# animation function. this is called sequentially
def animate(i):
global history, ax, line, label
w = history[i][0]
b = history[i][1]
if w[1] == 0: return line, label
x1 = -7
y1 = -(b + w[0] * x1) / w[1]
x2 = 7
y2 = -(b + w[0] * x2) / w[1]
line.set_data([x1, x2], [y1, y2])
x1 = 0
y1 = -(b + w[0] * x1) / w[1]
label.set_text(history[i])
label.set_position([x1, y1])
return line, label
# call the animator. blit=true means only re-draw the parts that have changed.
print history
anim = animation.FuncAnimation(fig, animate, init_func=init, frames=len(history), interval=1000, repeat=True,
blit=True)
plt.show()
anim.save('perceptron.gif', fps=2, writer='imagemagick')
可視化

可見超平面被誤分類點(diǎn)所吸引,朝著它移動(dòng),使得兩者距離逐步減小,直到正確分類為止。通過這個(gè)動(dòng)畫,是不是對(duì)感知機(jī)的梯度下降算法有了更直觀的感悟呢?
算法的收斂性
記輸入向量加進(jìn)常數(shù)1的拓充形式



的超平面可以將數(shù)據(jù)集完全正確地分類,定義最小值伽馬:

感知機(jī)學(xué)習(xí)算法的對(duì)偶形式
對(duì)偶指的是,將w和b表示為測(cè)試數(shù)據(jù)i的線性組合形式,通過求解系數(shù)得到w和b。具體說(shuō)來(lái),如果對(duì)誤分類點(diǎn)i逐步修改wb修改了n次,則w,b關(guān)于i的增量分別為

,則最終求解到的參數(shù)分別表示為:

于是有算法2.2:

則誤分類次數(shù)k滿足:

證明請(qǐng)參考《統(tǒng)計(jì)學(xué)習(xí)方法》P31。
感知機(jī)對(duì)偶算法代碼
涉及到比較多的矩陣計(jì)算,于是用NumPy比較多:
# -*- coding:utf-8 -*-
# Filename: train2.2.py
# Author:hankcs
# Date: 2015/1/31 15:15
import numpy as np
from matplotlib import pyplot as plt
from matplotlib import animation
# An example in that book, the training set and parameters' sizes are fixed
training_set = np.array([[[3, 3], 1], [[4, 3], 1], [[1, 1], -1]])
a = np.zeros(len(training_set), np.float)
b = 0.0
Gram = None
y = np.array(training_set[:, 1])
x = np.empty((len(training_set), 2), np.float)
for i in range(len(training_set)):
x[i] = training_set[i][0]
history = []
def cal_gram():
"""
calculate the Gram matrix
:return:
"""
g = np.empty((len(training_set), len(training_set)), np.int)
for i in range(len(training_set)):
for j in range(len(training_set)):
g[i][j] = np.dot(training_set[i][0], training_set[j][0])
return g
def update(i):
"""
update parameters using stochastic gradient descent
:param i:
:return:
"""
global a, b
a[i] += 1
b = b + y[i]
history.append([np.dot(a * y, x), b])
# print a, b # you can uncomment this line to check the process of stochastic gradient descent
# calculate the judge condition
def cal(i):
global a, b, x, y
res = np.dot(a * y, Gram[i])
res = (res + b) * y[i]
return res
# check if the hyperplane can classify the examples correctly
def check():
global a, b, x, y
flag = False
for i in range(len(training_set)):
if cal(i) <= 0:
flag = True
update(i)
if not flag:
w = np.dot(a * y, x)
print "RESULT: w: " + str(w) + " b: " + str(b)
return False
return True
if __name__ == "__main__":
Gram = cal_gram() # initialize the Gram matrix
for i in range(1000):
if not check(): break
# draw an animation to show how it works, the data comes from history
# first set up the figure, the axis, and the plot element we want to animate
fig = plt.figure()
ax = plt.axes(xlim=(0, 2), ylim=(-2, 2))
line, = ax.plot([], [], 'g', lw=2)
label = ax.text([], [], '')
# initialization function: plot the background of each frame
def init():
line.set_data([], [])
x, y, x_, y_ = [], [], [], []
for p in training_set:
if p[1] > 0:
x.append(p[0][0])
y.append(p[0][1])
else:
x_.append(p[0][0])
y_.append(p[0][1])
plt.plot(x, y, 'bo', x_, y_, 'rx')
plt.axis([-6, 6, -6, 6])
plt.grid(True)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Perceptron Algorithm 2 (www.hankcs.com)')
return line, label
# animation function. this is called sequentially
def animate(i):
global history, ax, line, label
w = history[i][0]
b = history[i][1]
if w[1] == 0: return line, label
x1 = -7.0
y1 = -(b + w[0] * x1) / w[1]
x2 = 7.0
y2 = -(b + w[0] * x2) / w[1]
line.set_data([x1, x2], [y1, y2])
x1 = 0.0
y1 = -(b + w[0] * x1) / w[1]
label.set_text(str(history[i][0]) + ' ' + str(b))
label.set_position([x1, y1])
return line, label
# call the animator. blit=true means only re-draw the parts that have changed.
anim = animation.FuncAnimation(fig, animate, init_func=init, frames=len(history), interval=1000, repeat=True,
blit=True)
plt.show()
# anim.save('perceptron2.gif', fps=2, writer='imagemagick')
可視化

與算法1的結(jié)果相同,我們也可以將數(shù)據(jù)集改一下:
<pre class="brush:python;toolbar:false prettyprint linenums" style="font-family: Menlo, Monaco, Consolas, "Courier New", monospace; box-sizing: border-box; overflow: hidden; font-size: 13px; display: block; padding: 10px 15px; margin: 20px 0px; line-height: 1.42857; color: rgb(248, 248, 212); word-break: break-all; word-wrap: normal !important; background: rgb(39, 40, 34); border: none; border-radius: 4px; box-shadow: rgb(57, 56, 46) 40px 0px 0px inset, rgb(70, 71, 65) 41px 0px 0px inset; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">
- training_set = np.array([[[3, 3], 1], [[4, 3], 1], [[1, 1], -1], [[5, 2], -1]])
</pre>
會(huì)得到一個(gè)復(fù)雜一些的結(jié)果:

讀后感
通過最簡(jiǎn)單的模型,學(xué)習(xí)到ML中的常用概念和常見流程。