神經(jīng)網(wǎng)絡:表達(二)

模型表達Ⅰ(Model Representation Ⅰ)

為了構建神經(jīng)網(wǎng)絡模型,我們需要借鑒大腦中的神經(jīng)系統(tǒng)。每一個神經(jīng)元都可以作為一個處理單元(Processing Unit)或神經(jīng)核(Nucleus),其擁有眾多用于輸入的樹突(Dendrite)和用于輸出的軸突(Axon),其中神經(jīng)元通過傳遞電脈沖來傳遞信息。

神經(jīng)網(wǎng)絡模型由一個個“神經(jīng)元”構成,而每一個“神經(jīng)元”又為一個學習模型,我們將這些“神經(jīng)元”稱為激活單元(Activation Unit)。

其中,參數(shù)θ在神經(jīng)網(wǎng)絡中也被稱為權重,假設函數(shù)hθ(x) = g(z),新增的x0稱為偏置單元(Bias Unit)。

在神經(jīng)網(wǎng)絡模型中(以三層神經(jīng)網(wǎng)絡模型為例),第一層為輸入層(Input Layer),最后一層為輸出層(Output Layer),中間的這層稱為隱藏層(Hidden Layer)。

我們引入如下標記用于描述神經(jīng)網(wǎng)絡模型:

  • ai(j):表示第j層的第i個激活單元;
  • θ(j):表示從第j層映射到第j+1層時權重矩陣。

注:在神經(jīng)網(wǎng)絡模型中,如若第j層有sj個激活單元,在第j+1層有sj+1個激活單元,則權重矩陣θ(j)的維度為sj+1 * (sj+1)。因此,上圖中權重矩陣θ(1)的維度3*4。

對于上圖所示的神經(jīng)網(wǎng)絡模型,我們可用如下數(shù)學表達式表示:

在邏輯回歸中,我們被限制使用數(shù)據(jù)集中的原始特征變量x,雖然我們可以通過多項式來組合這些特征,但我們仍然受到原始特征變量x的限制。

在神經(jīng)網(wǎng)絡中,原始特征變量x只作為輸入層,輸出層所做出的預測結果利用的是隱藏層的特征變量,由此我們可以認為隱藏層中特征變量是通過神經(jīng)網(wǎng)絡模型學習后,將得到的新特征用于預測結果,而非使用原始特征變量x用于預測結果。

補充筆記
Model Representation I

Visually, a simplistic representation looks like:

Our input nodes (layer 1), also known as the "input layer", go into another node (layer 2), which finally outputs the hypothesis function, known as the "output layer".

We can have intermediate layers of nodes between the input and output layers called the "hidden layers."

In this example, we label these intermediate or "hidden" layer nodes a02?an2 and call them "activation units."

If we had one hidden layer, it would look like:

The values for each of the "activation" nodes is obtained as follows:

This is saying that we compute our activation nodes by using a 3×4 matrix of parameters. We apply each row of the parameters to our inputs to obtain the value for one activation node. Our hypothesis output is the logistic function applied to the sum of the values of our activation nodes, which have been multiplied by yet another parameter matrix Θ(2) containing the weights for our second layer of nodes.

Each layer gets its own matrix of weights, Θ(j).

The dimensions of these matrices of weights is determined as follows:

If network has sj units in layer j and sj+1 units in layer j+1, then Θ(j) will be of dimension sj+1×(sj+1).

The +1 comes from the addition in Θ(j) of the "bias nodes," x0 and Θ0(j). In other words the output nodes will not include the bias nodes while the inputs will. The following image summarizes our model representation:

模型表達Ⅱ(Model Representation II)

以此圖為例,之前我們介紹其數(shù)學表達式。為了方便編碼及運算,我們將其向量化。

其中:

因此,我們可將之前的數(shù)學表達式改寫為:

其中向量X可記為a(1),則z(2) = Θ(1)a(1)。由此可得,a(2) = g(z(2))。

此時假設函數(shù)hθ(x)可改寫為:

其中:

補充筆記
Model Representation II

To re-iterate, the following is an example of a neural network:

In this section we'll do a vectorized implementation of the above functions. We're going to define a new variable zk(j) that encompasses the parameters inside our g function. In our previous example if we replaced by the variable z for all the parameters we would get:

In other words, for layer j=2 and node k, the variable z will be:

The vector representation of x and zj is:

Setting x=a(1), we can rewrite the equation as:

We are multiplying our matrix Θ(j?1) with dimensions sj×(n+1) (where sj is the number of our activation nodes) by our vector a(j?1) with height (n+1). This gives us our vector z(j) with height sj. Now we can get a vector of our activation nodes for layer j as follows:

Where our function g can be applied element-wise to our vector z(j).

We can then add a bias unit (equal to 1) to layer j after we have computed a(j). This will be element a0(j) and will be equal to 1. To compute our final hypothesis, let's first compute another z vector:

We get this final z vector by multiplying the next theta matrix after Θ(j?1) with the values of all the activation nodes we just got. This last theta matrix Θ(j) will have only one row which is multiplied by one column a(j) so that our result is a single number. We then get our final result with:

Notice that in this last step, between layer j and layer j+1, we are doing exactly the same thing as we did in logistic regression. Adding all these intermediate layers in neural networks allows us to more elegantly produce interesting and more complex non-linear hypotheses.

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯(lián)系作者
【社區(qū)內容提示】社區(qū)部分內容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發(fā)布,文章內容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關閱讀更多精彩內容

  • 可以去除魚腥的有: 肉豆蔻、月桂、丁香、眾香子、洋蔥、肉桂、大蒜、蔥姜、鼠尾草、小豆蔻、香菜等; 可以去除羊肉膻味...
    遇見最美的ni閱讀 14,606評論 2 5
  • 今日你一身紅裝 盤起的秀發(fā) 精致的頭飾 想必是極美的 心底也是甜蜜幸福的 婚禮的規(guī)程風俗想必也很繁瑣勞累 但走過這...
    小明的小紅的故事閱讀 379評論 0 1
  • 公子有一把黑紙傘,是曾祖父那輩人傳下來的,落滿了時間的灰塵,傘面雖陳舊些傘骨卻依然完好,不過總是掛在墻上,從未撐開...
    北風起兮閱讀 438評論 0 0
  • 這個察覺是沒有想到的結果,哈哈 先一步步來吧。 在家里,我和弟弟的形象幾乎是反的,我做什么做得好,他肯...
    木棉水水閱讀 680評論 0 49
  • 不想與你再有牽扯,我斷了你的所有聯(lián)系。 一切又重新浮現(xiàn)在我眼前。初夏的時節(jié),風是和煦的,林蔭道上投下上午溫暖的樹影...
    暮之心閱讀 347評論 0 0

友情鏈接更多精彩內容