9. Neural Networks: Learing

Neural Networks: Learing

Cost Function

L = total number of layers in network
s_j = no. of units (not counting bias unit) in layer l. K = s_L

  • Binary classification (K = 1)
  • Multi-class classification (K class)

J(\Theta) = -\frac{1}{m}[\sum_{i=1}^m\sum_{k=1}^Ky_k^{(i)}log(h_\Theta(x^{(i)}))_k+(1-y_k^{(i)})log(1-(h_\Theta(x^{(i)}))_k)]+\\ \frac{\lambda}{2m}\sum_{l=1}^{L-1}\sum_{i=1}^{s_l}\sum_{j=1}^{s_l+1}(\Theta_{ji}^{(l)})^2 \newline h_\Theta(x)\in R^k \qquad (h_\Theta(x))_i=i^{th}

Backpropagation algorithm

\delta_j^{(l)} = "error" of node j in layer l.

For each output unit (layer L = 4)

  1. \delta_j^{(4)} = a_j^{(4)} - y_j
  2. \delta^{(4)} = a^{(4)}-y
  3. \delta^{(3)} = (\Theta^{(3)})^T\delta^{(4)}.*g'(z^{(3)})(g'(z^{(2)}) = a^{(3)}.*(1-a^{(3)}))
  4. \delta^{(2)} = (\Theta^{(2)})^T\delta^{(3)}.*g'(z^{(2)})

Backpropagation algorithm

\Delta_{ij}^{(l)} := \Delta_{ij}^{(l)}+a_j^{(l)}\delta_i^{(l+1)}

Vectorized implementation:

\Delta^{(l)} := \Delta^{(l)}+\delta^{(l+1)}(a^{(l)})^T

D_{ij}^{(l)} = \Delta_{ij}^{(l)}+\lambda\Theta_{ij}^{(l)} \ (j\ne 0)
\frac{\partial}{\partial\Theta_{ij}^{(l)}}J(\Theta) = D_{ij}^{(l)}

Backpropagation intuition

Forward Propagation

Understand what Backpropagation does.

Implementation note: Unrolling parameters

function [jVal, gradient] = costFunction(theta)

The paramters 'theta' and 'gradient' must be a vector.However, in Neural Network, paramter 'theta' is a matrix. So we must find a way to unroll the matrix.

s_1=10,s_2=10,s_3=1 \newline \Theta^{(1)}\in R^{10\times11},\Theta^{(2)}\in R^{10\times11},\Theta^{(3)}\in R^{1\times11} \newline D^{(1)}\in R^{10\times11}D^{(2)}\in R^{10\times11},D^{(3)}\in R^{1\times11}

thetaVec = [Theta1(:);Theta2(:);Theta3(:)];
DVec = [D1(:);D2(:);D3(:)];
Theta1 = reshape(thetaVec(1:110),10,11);
Theta2 = reshape(thetaVec(111:220),10,11);
Theta3 = reshape(thetaVec(221:231),1,11);

Gradient checking

to make sure that the backpropagation and the forward propagation are correct.

  • one side difference
  • two side difference

Implement: gradApprox = (J(theta+EPSILON)-J(theta-EPSILON))/(2*EPSILON)

Parameter vector \theta

\frac{\partial}{\partial\theta_1}J(\theta)\approx\frac{J(\theta_1+\epsilon,\theta_2,\theta_3,...,\theta_n)-J(\theta_1-\epsilon,\theta_2,\theta_3,...,\theta_n)}{2\epsilon} \newline ...

Implementation Note:

  • Implement backprop to compute DVec.
  • Implement numerical gradient check to compute gradApprox.
  • Make sure they give similar values.
  • Turn off gradient checking. Using backprop code for learing.
  • Be sure to disable your gradient checking code before training your classifier.

Random initialization

If use 'zero initialization', after each update, parameters corresponding to inputs going into each of two hidden units are identical.

Random initialization: Symmetry breaking

Initialize each \Theta_{ij}^{(l)} to a random value in [-\epsilon, \epsilon]

Put it together

training a neural network

Pick a network architechture (connectivity pattern between neurons)

  • No. of input units: Dimension of features x^{(i)}
  • No. of output units: Nmuber of classes
  • Reasonable default: 1 hidden hayer, or if >1 hidden layer, have same no. of hidden units in every layer (usually the more the better)
  1. Randomly initialize weights
  2. Implemnet forward propagation to get h_\Theta(x^{(i)}) for any x^{(i)}
  3. Implement code to compute function J(\Theta)
  4. Implement backprop to compute partial derivatives \frac{\partial}{\partial\Theta_{jk}^{(l)}}J(\Theta)
  5. Use gradient checking to compare \frac{\partial}{\partial\Theta_{jk}^{(l)}}J(\Theta) computed using backpropagation vs. using numerical estimate of gradient of J(\Theta). Then disable gradient checking code.
  6. Use gradient descent or advanced optimization menthod with backpropagation to try to minimize J(\Theta) as a function of parameters \Theta

Autonomous driviong example

?著作權歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

友情鏈接更多精彩內(nèi)容