目錄鏈接:吳恩達(dá)Deep Learning學(xué)習(xí)筆記目錄
??1.Convolutional network
??2.Backward propagation of CNN
注:參見Convolutional Neural Networks: Step by Step
1. Convolutional network
??符號約定:
??①上標(biāo)[l]表示網(wǎng)絡(luò)第l層
??②上標(biāo)(i)表示第i個樣本
??③下標(biāo)(i)表示某一層第i個卷積核
??④nH、nW和nC表示一層的高、寬和通道數(shù)
? 1.1 packages
import numpy as np
import h5py
import matplotlib.pyplot as plt
%load_ext autoreload
%autoreload 2 #自動重載%aimport排除的模塊之外的所有模塊。
np.random.seed(1)
? 1.2 Outline of the Assignment
??(1)卷積函數(shù):
??Padding
??卷積核
??向前卷積
??向后卷積
??(2)池化函數(shù):
??向前池化
??創(chuàng)建掩碼
??分配值

? 1.3 CNN
??如果使用編程框架來創(chuàng)建一層卷積層,只需要一行代碼來搞定。但在這里我們需要手動來實現(xiàn)卷積層,并理解它是如何運行的:

??(1)Padding

??
np.pad(array, pad_width, mode):??參數(shù):
???
array:要被填充的數(shù)組;???
pad_width:構(gòu)成如((1,2),(3,4),...),第一個元組表示第一維,第二個元組表示第二維,依次類推,(1,2)表示在第一維最前面填充一行值,最后面填充兩行值;???
model:填充方式,如constant、edge等,這里選用constant,缺省值為0,constant_values=(1,3)是指一個維度中,前面用1填充,后面用3填充。舉例:
a = np.array([[1,2],[3,4]])
a_pad = np.pad(a,((1,1),(1,2)),'constant',constant_values = (0,9))
"""
輸出:
a: [[1 2]
[3 4]]
a_pad: [[0 0 0 9 9]
[0 1 2 9 9]
[0 3 4 9 9]
[0 9 9 9 9]]
"""
??定義padding函數(shù):
def zero_pad(X,pad):
"""
Argument:
X:a numpy array of m samples ,shape=(m,n_H,n_W,n_C),m is num of samples
pad: integer,amount of padding on horizontal and vertical dimensions
return:
X_pad: padded image of shape = (m,n_H+2*pad,n_W+2*pad,n_C)
"""
X_pad = np.pad(X,((0,0),(pad,pad),(pad,pad),(0,0)),"constant")
return X_pad
??測試:
np.random.seed(1)
x = np.random.randn(4,3,3,2)
x_pad = zero_pad(x,2)
print("x.shape = ",x.shape)
print("x_pad.shape",x_pad.shape)
fig,ax = plt.subplots(1,2)
ax[0].set_title('x')
ax[0].imshow(x[0,:,:,0])
ax[1].set_title('x_pad')
ax[1].imshow(x_pad[0,:,:,0])

??(2)單步卷積
??單步卷積實現(xiàn)步驟:
??輸入數(shù)據(jù)→在數(shù)據(jù)每個位置應(yīng)用卷積核→輸出數(shù)據(jù)(通道數(shù)和大小可能會被改變)

def conv_single_step(a_slice_pre,W,b):
"""
Argument:
a_slice_pre: a slice of input,dim=(f,f,n_C_prev)
W: weights of kernel,dim = (f,f,n_C_prev)
b: bias of kernel,dim = (1,1,1)
return:
Z: scalar,the output of convolved a_slice_pre
"""
s = np.multiply(a_slice_pre,W) + b
Z = np.sum(s)
return Z
測試:輸入數(shù)據(jù)切片后大?。▎尾骄矸e數(shù)據(jù)大?。?(4,4,3),卷積核大小 =(4,4,3),注:每個輸入數(shù)據(jù)通道都對應(yīng)一個(4,4)的W矩陣。
np.random.seed(1)
a_slice_pre = np.random.randn(4,4,3)
W = np.random.randn(4,4,3)
b = np.random.randn(1,1,1)
Z = conv_single_step(a_slice_pre,W,b)
print(Z)
"""
輸出:-23.16021220252078
"""
??(3)單層前向卷積
??在卷積網(wǎng)絡(luò)的前向計算中,將會采用多個卷積核來對輸入數(shù)據(jù)進(jìn)行卷積,每個卷積核都將輸出一個2D的矩陣,然后將每個卷積核的輸出堆疊成3D的。
??實現(xiàn)單層前向卷積:輸入數(shù)據(jù)為前一層的激活值輸出A_pre,各個卷積核的權(quán)重由W給出,每個卷積核有一個獨立的偏置b(但同一個核內(nèi)各通道共享),此外還要給出兩個超參數(shù)padding和stride。
??Hit:
??①采用切片技術(shù)從輸入矩陣中獲取a_slice_prev
??②定義一個a_slice_prev,需要確定的參數(shù)有vert_start, vert_end, horiz_start 和horiz_end。如:


def conv_forward(A_pre,W,b,hyperparams):
"""
Argument:
A_pre: the activations of previous layer ,dim = (m,n_H_prev,n_W_prev,n_C_prev)
W:weight matrix,dim = (f,f,n_C_prev,n_C)
b:bias vector,dim = (1,1,1,n_C)
hyperparams: pad and stride
return:
Z:conv output of current layer,dim = (m,n_H,n_W,n_C)
cache: store values for conv_backward()
"""
(m,n_H_prev,n_W_prev,n_C_prev) = A_pre.shape
(f,f,n_C_prev,n_C) = W.shape
stride = hyperparams["stride"]
pad = hyperparams["pad"]
n_H = int((n_H_prev + 2 * pad - f) / stride) + 1
n_W = int((n_W_prev + 2 * pad - f) / stride) + 1
Z = np.zeros((m,n_H,n_W,n_C))
A_pre_pad = zero_pad(A_pre,pad)
for i in range(m):
a_pre_pad = A_pre_pad[i]
for h in range(n_H):
for w in range(n_W):
for c in range(n_C):
vert_start = h * stride # h<height> vert<height>
vert_end = vert_start + f
horiz_start = w * stride
horiz_end = horiz_start + f
a_slice_pre = a_pre_pad[vert_start:vert_end,horiz_start:horiz_end,:]
Z[i,h,w,c] = conv_single_step(a_slice_pre,W[...,c],b[...,c])
assert (Z.shape == (m,n_H,n_W,n_C))
cache = (A_pre,W,b,hyperparams)
return Z,cache
測試:
np.random.seed(1)
A_pre = np.random.randn(10,4,4,3)
W = np.random.randn(2,2,3,8)
b = np.random.randn(1,1,1,8)
hy = {"pad":2,"stride":1}
Z,caches = conv_forward(A_pre,W,b,hy)
print(np.mean(Z))
print(caches[0][1][2][3])#A_pre中第1個樣本,第2 n_H,第3 n_W 所有通道數(shù)(3)的值
"""
輸出:
Z: 0.15585932488906465
cache: [-0.20075807 0.18656139 0.41005165]
"""
注1:for循環(huán)中,①最外層循環(huán)為選取樣本i;②次外層循環(huán)為用卷積核對該樣本縱向掃描(vertical),由于輸出數(shù)據(jù)高為n_H,所以循環(huán)n_H次;③次內(nèi)層循環(huán)為對該樣本橫向掃描(horizontal) n_W次;④最內(nèi)存循環(huán)為,選擇到卷積核對應(yīng)區(qū)域后,計算各個通道的Z值;
注2:計算當(dāng)前層c通道的Z值時,W[...,c]是指,對于當(dāng)前層通道這個維度選取第c個卷積核,其余維度全選,即選擇第c個卷積核的權(quán)值矩陣。
? 1.4 Pooling layer
??池化層能夠縮小數(shù)據(jù)的高寬,以減小計算量,同時,還能夠使得特征檢測器具有位置變化不敏感特性。一般有兩種池化:最大池化和平均池化。
??池化層沒有用于backward訓(xùn)練的參數(shù),但具有超參數(shù)kernel大小和步長stride。

def pool_forward(A_pre,hyperparams,mode = "max"):
"""
Argument:
A_pre: the activations of previous layer ,dim = (m,n_H_prev,n_W_prev,n_C_prev)
hyperparams: pad and stride
return:
A:output of pool layer,dim = (m,n_H,n_W,n_C)
cache:cache: store values for conv_backward()
"""
(m,n_H_prev,n_W_prev,n_C_prev) = A_pre.shape
f = hyperparams["f"]
stride = hyperparams["stride"]
n_H = int((n_H_prev - f) / stride) + 1
n_W = int((n_W_prev - f) / stride) + 1
n_C = n_C_prev
A = np.zeros((m,n_H,n_W,n_C))
for i in range(m):
a_pre = A_pre[i]
for h in range(n_H):
for w in range(n_W):
for c in range(n_C):
vert_start = h * stride
vert_end = vert_start + f
horiz_start = w * stride
horiz_end = horiz_start + f
a_slice_pre = a_pre[vert_start:vert_end,horiz_start:horiz_end,c]
if mode == "max":
A[i,h,w,c] = np.max(a_slice_pre)
elif mode == "average":
A[i,h,w,c] = np.mean(a_slice_pre)
cache = (A_pre,hyperparams)
assert(A.shape == (m,n_H,n_W,n_C))
return A,cache
注:注意到pooling與convolving中,a_slice_pre切片時有所區(qū)別,在convolving中,最后一維是選取所有值,這是要將該區(qū)域所有通道上的數(shù)據(jù),經(jīng)過同一個kernel不同通道的權(quán)值卷積后,求和,最后只會輸出一個通道;而在pooling中,此前是多少個通道,經(jīng)過pooling后還是多少個通道,各通道計算值獨立。
2. Backward propagation of CNN
??CNN由于卷積層(局部連接)和池化層(h、w縮?。┑拇嬖?,使得后向傳播與DNN有所不同。具體推導(dǎo)過程見另一篇學(xué)習(xí)筆記——卷積神經(jīng)網(wǎng)絡(luò)前向傳播和BP后向傳播計算步驟,請先看這篇筆記,下述講到的作業(yè)中的公式需這篇筆記中的推導(dǎo)來解釋。
? 2.1 Convolutional layer
??(1)計算dA
??在函數(shù)中采用下述公式來計算dA:

Wc表示第c個卷積核,dZhw表示l層輸出Z某一通道第h行、第w列的值的導(dǎo)數(shù)(即l-1層一個局部區(qū)域卷積計算輸出值的導(dǎo)數(shù))。在這計算公式中,將兩個加和符號展開,相當(dāng)于在對dZ和W進(jìn)行卷積(命名:卷積式1):
da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[i, h, w, c]
代碼中解釋為(假設(shè)出入為3通道的3x3數(shù)據(jù),卷積核2個(大小為2x2x3),步長1,所以輸出為2通道的2x2數(shù)據(jù)):
??①第一個for循環(huán),從所有樣本中選取第i個樣本,其dZ維度為(n_H, n_W, n_C)=(2,2,2);

第二、三個for循環(huán),從dZ中選取第h行、第w列值,如下圖,例如選取1行1列為黑色陰影所有通道,選取1行2列為綠色陰影所有通道
第四個循環(huán)選取其中一個kernel(也就是dZ中一個通道),對②中選取的值進(jìn)行元素級乘法(numpy中的 * 是指對應(yīng)元素相乘,當(dāng)一個值乘以一個矩陣時,采用廣播機(jī)制),計算dA值
+= 符號存在的意義有兩個:????i、dZ一個通道的值,經(jīng)由一個卷積核kernel 1,計算dA時(如③中圖),dA中一個通道內(nèi)數(shù)據(jù)的疊加,計算出的結(jié)果和
卷積式1計算結(jié)果相同:
??(2)計算dW
??計算dW公式如下:


dW[:,:,:,c] += a_slice * dZ[i, h, w, c]
??注:為何在求dA時一個dA通道內(nèi)的疊加僅是部分格子疊加,而求dW時卻是所有對應(yīng)格子疊加?因為在求dA的表達(dá)式中,da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :]所選取的格子(或者說h、w的坐標(biāo)只是部分重疊),而求dW的表達(dá)式中,dW[:,:,:,c]選取了所有的格子,所以所有對應(yīng)格子重疊疊加。
??(3)計算db
??計算db公式如下:

db[:,:,:,c] += dZ[i, h, w, c]
??求db時是將一個通道內(nèi)所有dZ加和作為一個卷積核的b。
??卷積層求dA,dW,db代碼實現(xiàn)如下:
def conv_backward(dZ,cache):
"""
Argument:
dZ: layer l gradient of the cost about Z
cache:(A_pre, W, b, hyperparams)
return:
dA_pre:(m, n_H_pre, n_W_pre, n_C_pre)
dW:(f, f, n_C_pre, n_C)
db:(1, 1, 1, n_C)
"""
(A_pre, W, b, hyperparams) = cache
(m, n_H_pre, n_W_pre, n_C_pre) = A_pre.shape
(f, f, n_C_pre, n_C) = W.shape
stride = hyperparams["stride"]
pad = hyperparams["pad"]
(m,n_H,n_W,n_C) = dZ.shape
dA_pre = np.zeros((m, n_H_pre, n_W_pre, n_C_pre))
dW = np.zeros((f, f, n_C_pre, n_C))
db = np.zeros((1, 1, 1, n_C))
A_pre_pad = zero_pad(A_pre,pad)
dA_pre_pad = zero_pad(dA_pre,pad)
for i in range(m):
a_pre_pad = A_pre_pad[i]
da_pre_pad = dA_pre_pad[i]
for h in range(n_H):
for w in range(n_W):
for c in range(n_C):
vert_start = h * stride
vert_end = vert_start + f
horiz_start = w * stride
horiz_end = horiz_start + f
a_slice = a_pre_pad[vert_start:vert_end,horiz_start:horiz_end,:]
da_pre_pad[vert_start:vert_end,horiz_start:horiz_end,:] += W[:,:,:,c] * dZ[i,h,w,c]
dW[:,:,:,c] += a_slice * dZ[i,h,w,c]
db[:,:,:,c] += dZ[i,h,w,c]
dA_pre[i,:,:,:] = da_pre_pad[pad:-pad,pad:-pad,:]
assert(dA_pre.shape == (m, n_H_pre, n_W_pre, n_C_pre))
return dA_pre,dW,db
測試
np.random.seed(1)
dA,dW,db = conv_backward(Z,caches)
print(np.mean(dA),np.mean(dW),np.mean(db))
"""
輸出:9.608990675868995 10.581741275547566 76.37106919563735
"""
? 2.2 Pooling layer
??(1)最大池化
??首先需要構(gòu)建一個函數(shù)來記錄最大池化時最大值的位置create_mask_from_window():

??函數(shù)輸入為一個(f,f)大小的矩陣,輸出為(f,f)的矩陣,矩陣最大值處返回True,其余返回False:
def creat_mask_from_window(x):
"""
Argument:
x: a array,dim = (f,f)
return:
mask:a array ,dim = (f,f), a true at the position of max values of x
"""
mask = x == np.max(x)
return mask
測試
np.random.seed(1)
x = np.random.randn(2,3)
mask = creat_mask_from_window(x)
print(mask)
"""
輸出:
[[ True False False]
[False False False]]
"""
??(2)平均池化
??平均池化,只需要將池化輸出層dZ平均分配回對應(yīng)位置即可:
def distribute_value(dz,shape):
(n_h,n_w) = shape
average = dz / (n_h * n_w)
a = np.ones(shape) * average
return a
測試
a = distribute_value(2,(2,2))
print(a)
"""
輸出:
[[0.5 0.5]
[0.5 0.5]]
"""
??(3)構(gòu)建完整池化向后傳播函數(shù)
def pool_backward(dA,cache,mode = "max"):
"""
return: dA_pre
"""
(A_pre,hyperparams) = cache
stride = hyperparams["stride"]
f = hyperparams["f"]
(m,n_H_pre,n_W_pre,n_C_pre) = A_pre.shape
(m,n_H,n_W,n_C) = dA.shape
dA_pre = np.zeros(A_pre.shape)
for i in range(m):
a_pre = A_pre[i]
for h in range(n_H):
for w in range(n_W):
for c in range(n_C):
vert_start = h * stride
vert_end = vert_start + f
horiz_start = w * stride
horiz_end = horiz_start + f
if mode == "max":
a_pre_slice = a_pre[vert_start:vert_end, horiz_start:horiz_end, c]
mask = creat_mask_from_window(a_pre_slice)
dA_pre[i,vert_start:vert_end, horiz_start:horiz_end, c] += np.multiply(mask, dA[i, h, w, c])
elif mode == "average":
da = dA[i,h,w,c]
shape = (f,f)
dA_pre[i, vert_start:vert_end, horiz_start:horiz_end, c] += distribute_value(da, shape)
assert(dA_pre.shape == A_pre.shape)
return dA_pre
測試
np.random.seed(1)
A_prev = np.random.randn(5, 5, 3, 2)
hparameters = {"stride" : 1, "f": 2}
A, cache = pool_forward(A_prev, hparameters)
dA = np.random.randn(5, 4, 2, 2)
dA_prev = pool_backward(dA, cache, mode = "max")
print("mode = max")
print('mean of dA = ', np.mean(dA))
print('dA_prev[1,:,:,1] = ', dA_prev[1,:,:,0])
print()
dA_prev = pool_backward(dA, cache, mode = "average")
print("mode = average")
print('mean of dA = ', np.mean(dA))
print('dA_prev[1,:,:,1] = ', dA_prev[1,:,:,1])
"""
輸出:
mode = max
mean of dA = 0.14571390272918056
dA_prev[1,:,:,1] =
[[ 0. 0. 0. ]
[ 0. 5.05844394 0. ]
[ 0. 0. 0. ]
[ 0. 1.37512611 0. ]
[ 0. -0.59248892 0. ]]
mode = average
mean of dA = 0.14571390272918056
dA_prev[1,:,:,1] =
[[ 0.05338348 -0.42070676 -0.47409023]
[ 0.2787552 -0.25749373 -0.53624893]
[ 0.16879316 0.0348075 -0.13398566]
[-0.13652896 -0.129969 0.00655996]
[-0.0799504 -0.00156347 0.07838693]]
"""
注:池化過程為什么也要用+=符號,不是說各通道間獨立么?是要用的,當(dāng)kernel移動的步長小于kernel大小時,有重疊區(qū)域,所以同一個通道上重疊的部分疊加。
總結(jié)
??本次作業(yè)主要用于理解CNN網(wǎng)絡(luò)卷積層和池化層前向傳播和后向傳播機(jī)制,并手動代碼實現(xiàn),本次課第2次作業(yè)采用的是tensorflow1做的,由于裝的tensorflow2.0,就不敲了,用2.0幾行代碼就搞定了,沒tensorflow1那么多。tensorflow2.0實現(xiàn)圖片分類參見Tensorflow學(xué)習(xí)筆記(六)——卷積神經(jīng)網(wǎng)絡(luò),tensorflow1 代碼見Convolutional Neural Networks: Application。