代碼和文檔在:https://github.com/nicktming/code/tree/dev/machine_learning/backpropagation
前言
繼上篇用代碼一步步理解梯度下降和神經(jīng)網(wǎng)絡(luò)(ANN)),今天這篇的目的主要是通過一個(gè)例子來理解反向傳播是怎么一回事?
復(fù)習(xí)
前文中已經(jīng)分析過了單層神經(jīng)網(wǎng)絡(luò)(沒有隱藏層)的時(shí)候如何通過梯度下降法改變參數(shù)來使得最終定義的
cost_function越來越小.
單層的時(shí)候或許比較好理解,畢竟直接求導(dǎo)就可以求出
cost_function關(guān)于每個(gè)參數(shù)的偏導(dǎo),對(duì)于多層的時(shí)候,面對(duì)那么每一層的每個(gè)參數(shù),他們是如何往回傳遞錯(cuò)誤信息的?如果每次都要從頭到晚求一遍?是不是很麻煩?想到這里估計(jì)腦子就有點(diǎn)亂了,而且直接看公式也不是那么好明白.
所以我先給出答案哈,改變每一層的參數(shù),都是需要求出
cost_function關(guān)于每一層每個(gè)參數(shù)的偏導(dǎo)的,因?yàn)檫@樣我才知道你對(duì)cost_function的影響有多大?如果你有點(diǎn)看不懂我這里說的,建議先看一下我的前一篇文章用代碼一步步理解梯度下降和神經(jīng)網(wǎng)絡(luò)(ANN)).至于要怎么求出來的,首先肯定不是從頭到晚又求一次,我們先不看公式,先通過自己寫的一個(gè)例子來看看到底是怎么一個(gè)情況?(我高中數(shù)學(xué)老師的一句話:如果看不懂,那么就把抽象問題具體化,通俗一點(diǎn)就是用例子)
例子: 2-layer ANN
圖中是一個(gè)2層的神經(jīng)網(wǎng)絡(luò),和數(shù)組的定義
激勵(lì)函數(shù)采用sigmoid,cost_function采用最小平方
Untitled Diagram(1).png
CodeCogsEqn(40).png
前向傳遞forward
CodeCogsEqn(41).png
CodeCogsEqn(42).png
CodeCogsEqn(4).gif
對(duì)應(yīng)代碼
# training samples 2 inputs and 2 outputs
X = np.random.rand(m, 2)
Y = np.random.rand(m, 2)
#layer 2
W2 = np.ones((2, 3))
b2 = np.ones((1, 3))
in2 = np.dot(X, W2) + b2
out2 = sigmoid(in2)
#layer 3
W3 = np.ones((3, 2))
b3 = np.ones((1, 2))
in3 = np.dot(out2, W3) + b3
out3 = sigmoid(in3)
#initial cost
cost = cost_function(out3, Y)
print("start:", cost)
反向傳播
反向傳播主要是求導(dǎo),求
cost_function關(guān)于各個(gè)參數(shù)的偏導(dǎo).
CodeCogsEqn(9).gif
CodeCogsEqn(10).gif
先解釋一件事情,為什么求參數(shù)的偏導(dǎo),上面求的是
cost_function關(guān)于in的偏導(dǎo)呢?你可以看一下前向傳播中的in的公式,如果我們求出了cost_function關(guān)于in的偏導(dǎo),就可以cost_function求出任意參數(shù)的偏導(dǎo).
那既然我們已經(jīng)確定了
cost_function對(duì)in偏導(dǎo)的作用,那你觀察上面的in2和in3之間的關(guān)聯(lián),in2是如何通過in3可以求出來的.
CodeCogsEqn(12).gif
CodeCogsEqn(11).gif
在得到
in2和in3的關(guān)系了后,明顯在所有隱藏層中都可以運(yùn)用這個(gè)公式.自然而然輸出層的in3是第一步需要求的,因?yàn)楹竺嫠须[藏層是依賴于上一層的cost_function對(duì)in偏導(dǎo).
在明白了如何求得
cost_function對(duì)in偏導(dǎo)后,可以根據(jù)in的前向傳遞公式就可以求得關(guān)于此層中in關(guān)于參數(shù)的偏導(dǎo)進(jìn)而就可以得到cost_function關(guān)于這個(gè)參數(shù)的偏導(dǎo),用矩陣表達(dá)式就是上圖中的公式.
對(duì)應(yīng)代碼:
derivative_c_out3 = np.subtract(out3, Y) / m
derivative_out3_in3 = derivative_sigmoid(in3)
derivative_c_in3 = np.multiply(derivative_c_out3, derivative_out3_in3)
#find derivative of cost function to W3 and b3 in layer3
dw3 = np.dot(out2.T, derivative_c_in3)
db3 = np.sum(derivative_c_in3, axis=0)
#find derivative of cost function to in2 in layer2
derivative_out2_in2 = derivative_sigmoid(in2)
derivative_c_in2 = np.multiply(np.dot(derivative_c_in3, W3.T), derivative_out2_in2)
#find derivative of cost function to W2 and b2 in layer2
dw2 = np.dot(X.T, derivative_c_in2)
db2 = np.sum(derivative_c_in2, axis=0)
#update all variables
W3 = W3 - step * dw3
W2 = W2 - step * dw2
b3 = b3 - step * db3
b2 = b2 - step * db2
整體代碼
目標(biāo)是讓
cost_function的值小于0.1
import numpy as np
def sigmoid(x):
return 1/(1+np.exp(-x))
def derivative_sigmoid(x):
return np.multiply(1 - sigmoid(x), sigmoid(x))
def cost_function(yo, Y):
return 1./(2*m) * np.sum(np.square(np.subtract(yo, Y)))
#num of samples and learning rate
m = 10
step = 0.01
# training samples 2 inputs and 2 outputs
X = np.random.rand(m, 2)
Y = np.random.rand(m, 2)
#layer 2
W2 = np.ones((2, 3))
b2 = np.ones((1, 3))
in2 = np.dot(X, W2) + b2
out2 = sigmoid(in2)
#layer 3
W3 = np.ones((3, 2))
b3 = np.ones((1, 2))
in3 = np.dot(out2, W3) + b3
out3 = sigmoid(in3)
#initial cost
cost = cost_function(out3, Y)
print("start:", cost)
cnt = 0;
while not cost < 0.1 :
#find derivative of cost function to in2 in layer3
derivative_c_out3 = np.subtract(out3, Y) / m
derivative_out3_in3 = derivative_sigmoid(in3)
derivative_c_in3 = np.multiply(derivative_c_out3, derivative_out3_in3)
#find derivative of cost function to W3 and b3 in layer3
dw3 = np.dot(out2.T, derivative_c_in3)
db3 = np.sum(derivative_c_in3, axis=0)
#find derivative of cost function to in2 in layer2
derivative_out2_in2 = derivative_sigmoid(in2)
derivative_c_in2 = np.multiply(np.dot(derivative_c_in3, W3.T), derivative_out2_in2)
#find derivative of cost function to W2 and b2 in layer2
dw2 = np.dot(X.T, derivative_c_in2)
db2 = np.sum(derivative_c_in2, axis=0)
#update all variables
W3 = W3 - step * dw3
W2 = W2 - step * dw2
b3 = b3 - step * db3
b2 = b2 - step * db2
# forward to get new out3 with X
in2 = np.dot(X, W2) + b2
out2 = sigmoid(in2)
in3 = np.dot(out2, W3) + b3
out3 = sigmoid(in3)
# get new cost with new out3 with X
cost = cost_function(out3, Y)
if cnt % 100 == 0:
print("cost:", cost)
cnt += 1
#output how many times used to minimize cost
print("end:", cost)
print("cnt:", cnt)
結(jié)果:
image.png
公式
輸出層:
CodeCogsEqn(15).gif
隱藏層:
CodeCogsEqn(14).gif
最后用一張網(wǎng)上的圖來總結(jié):
WechatIMG435.jpeg
參考:
http://www.hankcs.com/ml/back-propagation-neural-network.html












