Coursera網(wǎng)課，斯坦福機(jī)器學(xué)習(xí) machine-learning 作業(yè)1解析

編程實(shí)現(xiàn)單一變量的線(xiàn)性回歸cost function 和 gradient descent

cost function

元數(shù)據(jù)含有兩列，一列為x（變量），一列為y(值)：

文本數(shù)據(jù)，列1為x，列2為y

線(xiàn)性回歸cost function公式如上，輸入?yún)?shù)有三：

X --- dimension 為 m * 2的矩陣，m為數(shù)據(jù)樣本總數(shù)即行數(shù)，第一列數(shù)據(jù)全為1，意即在θ0后乘以了一個(gè)X0=1.
此處變量只有一個(gè)。

X如圖：

X
y --- dimension為m*1的矩陣，存儲(chǔ)著對(duì)應(yīng)每個(gè)x值的真實(shí)y值
y如圖：

y

【X和y是從上面導(dǎo)入的元數(shù)據(jù)生成的】

theta --- dimension為2*1的矩陣，存儲(chǔ)著猜想中的兩個(gè)θ的值，初始值可設(shè)為[0 ; 0]。

Cost function的實(shí)現(xiàn)如下：

function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
%   J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
%   parameter for linear regression to fit the data points in X and y

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0;

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
%               You should set J to the cost.

predic = X * theta;          %all the predicted y
dif = predic - y;            %dif is the difference between predicted y and real y
dif = dif.^2;                % pow2 of dif
s = sum(dif);                % sum of pow2 of dif
J = s/(2*m);                 % divided by 2m

% =========================================================================

end

根據(jù)以上公式，我們自然可以用循環(huán)的方法計(jì)算全部的h(x)-y再累加，但是用矩陣計(jì)算速度更快；

矩陣計(jì)算原理如下：
X [m * 2]
y [m * 1]
theta [2 * 1]

矩陣X * theta 得到的是形如m*1的矩陣（向量），其數(shù)據(jù)為根據(jù)此theta和每一個(gè)變量計(jì)算出來(lái)的每一個(gè)y的預(yù)測(cè)值；
再減去存有真實(shí)結(jié)果的y矩陣（向量），得到如下矩陣：
[ h(x1) - y1;
h(x2) - y2;
...
h(xm) - ym]
根據(jù)公式，我們需要將此矩陣（向量）取二次方，寫(xiě)作dif = dif.^2;
然后再計(jì)算此矩陣(向量)的累加值，寫(xiě)作s = sum(dif);
最后再根據(jù)公式寫(xiě)作J = s/(2*m);
此處的J就是由給出的theta得到的cost的值
X和y在后續(xù)不會(huì)有所改變，但我們會(huì)逐步改變theta的值，以求得最小的cost來(lái)確定最佳theta

Gradient descent的實(shí)現(xiàn)

Gradient descent公式，用于更新theta值

這里我們的theta（系數(shù)）值只有兩個(gè)，是可以用index指定來(lái)計(jì)算以上公式中最后那個(gè)Xj的,
比如類(lèi)似以下這位同學(xué)的解法，用到了X(:,2)來(lái)取X的第二列（即Xj），又用了theta(1)和theta(2)這樣的寫(xiě)法來(lái)一個(gè)一個(gè)單獨(dú)求解theta(系數(shù)). 但是假如我們有很多個(gè)系數(shù)需要求，這樣的法子顯然就不行了。

通過(guò)index獲取Xj

所以，我們還是傾向于用矩陣計(jì)算來(lái)解題：

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
%   theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by 
%   taking num_iters gradient steps with learning rate alpha

% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters

    % ====================== YOUR CODE HERE ======================
    % Instructions: Perform a single gradient step on the parameter vector
    %               theta. 
    %
    % Hint: While debugging, it can be useful to print out the values
    %       of the cost function (computeCost) and gradient here.
    %

    predic = X * theta;
    dif = predic - y; 
    dif = dif.* X;
    s = sum(dif);
    gradient = alpha*s/m;
    theta = theta - gradient';

    % ============================================================

    % Save the cost J in every iteration    
    J_history(iter) = computeCost(X, y, theta);

end

end

第一步，X * theta得到一個(gè)形如m * 1的矩陣，存放所有的預(yù)測(cè)值；
第二步，使其減去真實(shí)值，得 m * 1的預(yù)測(cè)值和真實(shí)值差值矩陣，寫(xiě)作dif = predic - y;
第三步是關(guān)鍵，重新看一下線(xiàn)性公式和更新theta值所用的Gradient descent公式：

線(xiàn)性公式

Gradient descent公式，用于更新theta值

根據(jù)線(xiàn)性公式和gradient descent公式，我們對(duì)每一個(gè)系數(shù)θ的更新，需要求每一個(gè)（行）樣本的差值*此θ對(duì)應(yīng)的每一個(gè)（行）X的值；

對(duì)于θ1來(lái)說(shuō)，差值h(x1)-y1需要乘以X1，差值h(x2)-y2需要乘以X2 ...
但對(duì)于θ0來(lái)說(shuō)，差值需要乘以的其實(shí)是常數(shù)1（因?yàn)樵诰€(xiàn)性公式里θ0背后沒(méi)有乘以變量）

意即，這里所見(jiàn)的gradient descent公式，是要套用到所有θ上面的。
而既然我們已經(jīng)把所有的θ值（在這里只有兩個(gè)）存在了一個(gè)2 * 1的矩陣（向量）里面了，那我們就可以直接進(jìn)行矩陣操作：
dif = dif.* X;
注意這里的乘法前面有一個(gè)點(diǎn)，這個(gè)點(diǎn)的存在使得這一步操作并非是m * 1的差值矩陣乘以 m * 2的X （兩矩陣若想相乘必須保證前矩陣的列數(shù)等于后矩陣的行數(shù)，這倆一個(gè)是1一個(gè)是m顯然不能相乘）；
這個(gè)點(diǎn)的用法請(qǐng)參考 matlab dot 方法
作用是，用前矩陣的每一個(gè)元素去乘后矩陣的每一個(gè)元素，所以要保證兩個(gè)矩陣形狀一致，
但是當(dāng)其中一個(gè)矩陣實(shí)為向量時(shí)，只需要保證向量的行和第二個(gè)矩陣的行數(shù)一樣就可，
達(dá)成的效果是，用向量的每一行數(shù)據(jù)去乘矩陣的每一行中的每一個(gè)數(shù)據(jù)。
例如：

矩陣點(diǎn)乘法

所以dif = dif.* X;操作得到的新矩陣是m * 2的矩陣，
其第一行為[dif1, dif1 * X1]，第二行為[dif2, dif2 * X2]...直到[difm, difm * Xm]
如果我們豎著求每一列的sum，那得到的不就每一個(gè)θ在Gradient descent公式里面的求和部分了嘛?。ㄟ@里有兩列，對(duì)應(yīng)兩個(gè)θ）
這里用sum()方法對(duì)矩陣求和，s = sum(dif)剛好會(huì)得到一個(gè)1*2的矩陣，第一個(gè)數(shù)據(jù)是第一列的和，第二個(gè)數(shù)據(jù)是第二列的和；
所以第四步，再根據(jù)公式補(bǔ)全求和后需要乘以的alpha和1/m，寫(xiě)作gradient = alpha*s/m;
第五步，直接用2 * 1 的theta矩陣減去這個(gè)2 * 1 的gradient矩陣，就等于對(duì)每一個(gè)θ進(jìn)行了θ = θ - gradient操作了。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

[Octave] 線(xiàn)性回歸的cost function 和 gradient descent實(shí)現(xiàn)

[Octave] 線(xiàn)性回歸的cost function 和 gradient descent實(shí)現(xiàn)

編程實(shí)現(xiàn)單一變量的線(xiàn)性回歸cost function 和 gradient descent

Cost function的實(shí)現(xiàn)如下：

Gradient descent的實(shí)現(xiàn)

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

[Octave] 線(xiàn)性回歸的cost function 和 gradient descent實(shí)現(xiàn)

編程實(shí)現(xiàn)單一變量的線(xiàn)性回歸cost function 和 gradient descent

Cost function的實(shí)現(xiàn)如下：

Gradient descent的實(shí)現(xiàn)

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av