練習題來源:https://github.com/CaptainLYN/Andrew-NG-Meachine-Learning
1、基礎訓練
Octave熱身訓練,構建一個5 x 5的單位矩陣。
warmUpExercise.m實現代碼:
function A = a()
%WARMUPEXERCISE Example function in octave
% A = WARMUPEXERCISE() is an example function that returns the 5x5 identity matrix
A = [];
% ============= YOUR CODE HERE ==============
% Instructions: Return the 5x5 identity matrix
% In octave, we return values by defining which variables
% represent the return values (at the top of the file)
% and then set them accordingly.
A = eye(5);
% ===========================================
end
eye(n)用來構建n x n的單位矩陣。關于Octave的更多命令可以看官方手冊或者在控制臺輸入help xxx來看具體函數說明(xxx為對應函數)。
控制臺輸出結果:
ans =
Diagonal Matrix
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
2、單變量線性回歸:圖像繪制
熟悉plot函數的使用,使用plot函數前需要用figure打開一個圖像窗口。
plotData.m實現代碼:
function plotData(x, y)
%PLOTDATA Plots the data points x and y into a new figure
% PLOTDATA(x,y) plots the data points and gives the figure axes labels of
% population and profit.
figure; % open a new figure window
% ====================== YOUR CODE HERE ======================
% Instructions: Plot the training data into a figure using the
% "figure" and "plot" commands. Set the axes labels using
% the "xlabel" and "ylabel" commands. Assume the
% population and revenue data have been passed in
% as the x and y arguments of this function.
%
% Hint: You can use the 'rx' option with plot to have the markers
% appear as red crosses. Furthermore, you can make the
% markers larger by using plot(..., 'rx', 'MarkerSize', 10);
% rx is the option about marker style
plot(x, y, 'rx', 'MarkerSize', 10);%Plot the data
ylabel('Profit in $10,000s'); % Set the y-axis label
xlabel('Population of City in 10,000s'); % Set the x-axis label
% ============================================================
end
輸出結果:

3、單變量線性回歸:代價函數
假設函數:
代價函數:

代價函數可用矩陣運算的形式來表達(后續(xù)實現以此為基礎):

computeCost.m實現代碼:
function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
% J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
% parameter for linear regression to fit the data points in X and y
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
% You should set J to the cost.
J = (X * theta - y)' * (X * theta - y) / (2 * m);
% =========================================================================
end
控制臺輸出:
Testing the cost function ...
With theta = [0 ; 0]
Cost computed = 32.072734
Expected cost value (approx) 32.07
With theta = [-1 ; 2]
Cost computed = 54.242455
Expected cost value (approx) 54.24
4、單變量線性回歸:梯度下降函數
梯度下降函數:
- 注意:這里的θj需要同時更新
為了追蹤θ的變化,添加了注釋打印每次更新后的θ的值。
gradientDescent.m實現代碼:
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
% theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by
% taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% ====================== YOUR CODE HERE ======================
% Instructions: Perform a single gradient step on the parameter vector
% theta.
%
% Hint: While debugging, it can be useful to print out the values
% of the cost function (computeCost) and gradient here.
%
% gradient descent
theta = theta - alpha / m * (X' *(X * theta - y)) ;
% ============================================================
% Save the cost J in every iteration
fprintf('%d times : %f, %f\n',iter, theta(1), theta(2));
J_history(iter) = computeCost(X, y, theta);
fprintf('%d times : %f\n',iter, J_history(iter));
end
end
控制臺輸出:
Theta found by gradient descent:
-3.630291
1.166362
Expected theta values (approx)
-3.6303
1.1664
For population = 35,000, we predict a profit of 4519.7678
68
For population = 70,000, we predict a profit of 45342.450
129
最后繪制相關圖像:



5、多變量線性回歸:特征縮放處理
訓練數據集的例子為房屋大小、房間數量、房屋價格,由于房間數量的值遠比房屋大小的數值要小,因此第一步先進行數據的特征縮放處理。
ex1中給出的特征縮放處理方式是:先求特征變量的平均值、標準差,分別保存到mu,sigma兩個變量中,然后將每個原特征減去mu再除以sigma。
- 注意:區(qū)分
./和/的區(qū)別,一個為點除,一個為除,乘法運算同理。
featureNormalize.m實現代碼:
function [X_norm, mu, sigma] = featureNormalize(X)
%FEATURENORMALIZE Normalizes the features in X
% FEATURENORMALIZE(X) returns a normalized version of X where
% the mean value of each feature is 0 and the standard deviation
% is 1. This is often a good preprocessing step to do when
% working with learning algorithms.
% You need to set these values correctly
X_norm = X;
mu = zeros(1, size(X, 2));
sigma = zeros(1, size(X, 2));
% ====================== YOUR CODE HERE ======================
% Instructions: First, for each feature dimension, compute the mean
% of the feature and subtract it from the dataset,
% storing the mean value in mu. Next, compute the
% standard deviation of each feature and divide
% each feature by it's standard deviation, storing
% the standard deviation in sigma.
%
% Note that X is a matrix where each column is a
% feature and each row is an example. You need
% to perform the normalization separately for
% each feature.
%
% Hint: You might find the 'mean' and 'std' functions useful.
%
% get mean
for i = 1 : size(X, 2)
mu(i) = mean(X(:, i));
endfor
% get standard deviatio
for i = 1 : size(X, 2)
sigma(i) = std(X(:, i));
endfor
X_norm -= mu;
X_norm ./= sigma;
% ============================================================
end
size(X, 2)表示矩陣X有多少個特征(列),mean為平均值函數,std為標準差函數,X(:, i)表示第i個特征(列)下所有行。
輸出結果:


6、多變量線性回歸:代價函數、梯度下降、預測值輸出
代價函數和梯度下降函數可復用前面單變量所實現的,也就是gradientDescentMulti.m和gradientDescent.m完全一致,computeCostMulti.m和computeCostMulti.m也完全一致。
在ex1_multi.m的Part 2: Gradient Descent中,需要我們修改的是預測值輸出的部分。由于前面我們計算梯度下降函數之前對X做了特征縮放處理,因此預測值也需要進行同樣的處理。
ex1_multi.m的Part 2部分實現:
% Estimate the price of a 1650 sq-ft, 3 br house
% ====================== YOUR CODE HERE ======================
% Recall that the first column of X is all-ones. Thus, it does
% not need to be normalized.
price = 0; % You should change this
X1 = [1 (([1650, 3] - mu) ./ sigma)];
price = X1 * theta;
輸出結果:

控制臺輸出:
Theta computed from gradient descent:
334302.063993
100087.116006
3673.548451
Predicted price of a 1650 sq-ft, 3 br house (using gradient des
cent):
$289314.620338
7、多變量線性回歸:使用正規(guī)方程計算θ
正規(guī)方程:

這種方式不需要進行迭代計算,是另一種不同于梯度下降的求最小代價函數的方式。
normalEqn.m實現代碼:
function [theta] = normalEqn(X, y)
%NORMALEQN Computes the closed-form solution to linear regression
% NORMALEQN(X,y) computes the closed-form solution to linear
% regression using the normal equations.
theta = zeros(size(X, 2), 1);
% ====================== YOUR CODE HERE ======================
% Instructions: Complete the code to compute the closed form solution
% to linear regression and put the result in theta.
%
% ---------------------- Sample Solution ----------------------
theta = pinv(X' * X) * X' * y;
% -------------------------------------------------------------
% ============================================================
end
需要注意的是,這里的X并沒有進行特征縮放處理,所以預測值也不需要進行特征處理,直接使用即可。
ex1_multi.m的Part 3部分實現:
% Estimate the price of a 1650 sq-ft, 3 br house
% ====================== YOUR CODE HERE ======================
price = 0; % You should change this
price = [1 1650 3] * theta;
控制臺輸出:
Solving with normal equations...
Theta computed from the normal equations:
89597.909544
139.210674
-8738.019113
Predicted price of a 1650 sq-ft, 3 br house (using normal equat
ions):
$293081.464335
這里會發(fā)現,用正則方程求出來的預測值和梯度下降方式求出來的預測值雖然接近但有點偏差,這個原因和梯度下降的學習速率α以及迭代次數有關。在一定的范圍內,學習速率適當放大,或迭代次數適當增多,會使代價函數越來越小,也就會使預測值越來越靠近正規(guī)方程的預測結果。
可以通過設定不同的α和迭代次數來證實上述觀點。
例子1:α=0.01,迭代次數=600
Theta computed from gradient descent:
339593.963965
106225.016166
-2253.622541
Predicted price of a 1650 sq-ft, 3 br house (using gradie
nt descent):
$293223.790542
Program paused. Press enter to continue.
Solving with normal equations...
Theta computed from the normal equations:
89597.909544
139.210674
-8738.019113
Predicted price of a 1650 sq-ft, 3 br house (using normal
equations):
$293081.464335
例子2:α=0.03,迭代次數=400
Theta computed from gradient descent:
340410.918973
110308.113371
-6326.538108
Predicted price of a 1650 sq-ft, 3 br house (using gradie
nt descent):
$293149.994329
Program paused. Press enter to continue.
Solving with normal equations...
Theta computed from the normal equations:
89597.909544
139.210674
-8738.019113
Predicted price of a 1650 sq-ft, 3 br house (using normal
equations):
$293081.464335