支持向量機(jī)

本部分練習(xí)，我們將在2D示例數(shù)據(jù)集上使用支持向量機(jī)。通過(guò)在這些數(shù)據(jù)集上使用支持向量機(jī)，將幫助我們初識(shí)支持向量機(jī)的運(yùn)行原理，以及如何使用高斯核函數(shù)支持向量機(jī)。

任務(wù)一示例數(shù)據(jù)集1

本任務(wù)要求我們修改參數(shù)C的值，觀察支持向量機(jī)對(duì)數(shù)據(jù)集的判定邊界。ex6.m文件中已將相關(guān)代碼備好，其代碼如下：

%% =============== Part 1: Loading and Visualizing Data ================
%  We start the exercise by first loading and visualizing the dataset. 
%  The following code will load the dataset into your environment and plot
%  the data.
%

fprintf('Loading and Visualizing Data ...\n')

% Load from ex6data1: 
% You will have X, y in your environment
load('ex6data1.mat');

% Plot training data
plotData(X, y);

fprintf('Program paused. Press enter to continue.\n');
pause;

%% ==================== Part 2: Training Linear SVM ====================
%  The following code will train a linear SVM on the dataset and plot the
%  decision boundary learned.
%

% Load from ex6data1: 
% You will have X, y in your environment
load('ex6data1.mat');

fprintf('\nTraining Linear SVM ...\n')

% You should try to change the C value below and see how the decision
% boundary varies (e.g., try C = 1000)
C = 1;
model = svmTrain(X, y, C, @linearKernel, 1e-3, 20);
visualizeBoundaryLinear(X, y, model);

fprintf('Program paused. Press enter to continue.\n');
pause;

當(dāng)參數(shù)C = 1時(shí)，其運(yùn)行結(jié)果如下：

C = 1

當(dāng)參數(shù)C = 100時(shí)，其運(yùn)行結(jié)果如下：

C = 100

任務(wù)二高斯核函數(shù)支持向量機(jī)

本部分我們使用高斯核函數(shù)支持向量機(jī)對(duì)數(shù)據(jù)集進(jìn)行非線性地劃分。

首先我們需要使用高斯核函數(shù)構(gòu)建新的特征變量f_i，i = 1, 2, 3, ......因此，我們需要在gaussianKernel.m文件中，將高斯核函數(shù)補(bǔ)充完整。

高斯核函數(shù)

參考代碼如下：

sim = exp(-(x1 - x2)' * (x1 - x2) / (2 * sigma * sigma));

在ex6.m文件中運(yùn)行如下部分代碼，可得到的結(jié)果為：0.324652。

%% =============== Part 3: Implementing Gaussian Kernel ===============
%  You will now implement the Gaussian kernel to use
%  with the SVM. You should complete the code in gaussianKernel.m
%
fprintf('\nEvaluating the Gaussian Kernel ...\n')

x1 = [1 2 1]; x2 = [0 4 -1]; sigma = 2;
sim = gaussianKernel(x1, x2, sigma);

fprintf(['Gaussian Kernel between x1 = [1; 2; 1], x2 = [0; 4; -1], sigma = %f :' ...
         '\n\t%f\n(for sigma = 2, this value should be about 0.324652)\n'], sigma, sim);

fprintf('Program paused. Press enter to continue.\n');
pause;

我們需要導(dǎo)入新的數(shù)據(jù)集-樣例數(shù)據(jù)集2，通過(guò)使用高斯核函數(shù)支持向量機(jī)，對(duì)該數(shù)據(jù)集進(jìn)行非線性劃分。如下為ex6.m文件中已備好的相關(guān)代碼。

%% =============== Part 4: Visualizing Dataset 2 ================
%  The following code will load the next dataset into your environment and 
%  plot the data. 
%

fprintf('Loading and Visualizing Data ...\n')

% Load from ex6data2: 
% You will have X, y in your environment
load('ex6data2.mat');

% Plot training data
plotData(X, y);

fprintf('Program paused. Press enter to continue.\n');
pause;

%% ========== Part 5: Training SVM with RBF Kernel (Dataset 2) ==========
%  After you have implemented the kernel, we can now use it to train the 
%  SVM classifier.
% 
fprintf('\nTraining SVM with RBF Kernel (this may take 1 to 2 minutes) ...\n');

% Load from ex6data2: 
% You will have X, y in your environment
load('ex6data2.mat');

% SVM Parameters
C = 1; sigma = 0.1;

% We set the tolerance and max_passes lower here so that the code will run
% faster. However, in practice, you will want to run the training to
% convergence.
model= svmTrain(X, y, C, @(x1, x2) gaussianKernel(x1, x2, sigma)); 
visualizeBoundary(X, y, model);

fprintf('Program paused. Press enter to continue.\n');
pause;

其運(yùn)行結(jié)果為：

為了更進(jìn)一步熟悉使用高斯核函數(shù)支持向量機(jī)，我們導(dǎo)入新的數(shù)據(jù)集——樣例數(shù)據(jù)集3。在ex6data3.mat文件中，其將該數(shù)據(jù)集分為了兩部分，即訓(xùn)練集和交叉驗(yàn)證集。我們需要將dataset3Params.m文件補(bǔ)充完整，即我們需要在參數(shù)C∈{0.01, 0.03, 0.1, 0.3, 1, 3, 10, 30}和參數(shù)σ∈{0.01, 0.03, 0.1, 0.3, 1, 3, 10, 30}中找到一個(gè)最優(yōu)值，使其能夠?qū)?shù)據(jù)集進(jìn)行正確地劃分。

dataset3Params.m文件的參考代碼如下：

C_temp = [0.01; 0.03; 0.1; 0.3; 1; 3; 10; 30];  
sigma_temp = [0.01; 0.03; 0.1; 0.3; 1; 3; 10; 30];

error_val = zeros(length(C_temp), length(sigma_temp));

for i = 1 : length(C_temp)
    for j = 1 : length(sigma_temp)
        model = svmTrain(X, y, C_temp(i), @(x1, x2) gaussianKernel(x1, x2, sigma_temp(j)));
        predictions = svmPredict(model, Xval);
        error_val(i, j) = mean(double(predictions ~= yval));
    end
end

[I, J] = find(error_val ==  min(error_val(:)));    % 找最小元素位置
C = C_temp(I)                                      % 1
sigma = sigma_temp(J)                              % 0.100

ex6.m文件中本部分代碼如下：

%% =============== Part 6: Visualizing Dataset 3 ================
%  The following code will load the next dataset into your environment and 
%  plot the data. 
%

fprintf('Loading and Visualizing Data ...\n')

% Load from ex6data3: 
% You will have X, y in your environment
load('ex6data3.mat');

% Plot training data
plotData(X, y);

fprintf('Program paused. Press enter to continue.\n');
pause;

%% ========== Part 7: Training SVM with RBF Kernel (Dataset 3) ==========

%  This is a different dataset that you can use to experiment with. Try
%  different values of C and sigma here.
% 

% Load from ex6data3: 
% You will have X, y in your environment
load('ex6data3.mat');

% Try different SVM Parameters here
[C, sigma] = dataset3Params(X, y, Xval, yval);

% Train the SVM
model= svmTrain(X, y, C, @(x1, x2) gaussianKernel(x1, x2, sigma));
visualizeBoundary(X, y, model);

fprintf('Program paused. Press enter to continue.\n');
pause;

其運(yùn)行結(jié)果為：

垃圾郵件分類器

通過(guò)使用支持向量機(jī)實(shí)現(xiàn)垃圾郵件過(guò)濾器。

任務(wù)一電子郵件預(yù)處理

首先，我們導(dǎo)入樣本數(shù)據(jù)，其內(nèi)容如下：

眾所周知，每封電子郵件都包含有類似于URLs和電子郵件地址等內(nèi)容，但其主體內(nèi)容是不同的。因此，為了提高分類效率，我們對(duì)這些類似的內(nèi)容進(jìn)行歸一化處理，例如：將URLs用“httpaddr”文本代替等。該功能代碼已在processEmail.m文件中準(zhǔn)備好了。

我們運(yùn)行ex6_spam.m文件中該部分代碼，結(jié)果如下：

==== Processed Email ====

anyon know how much it cost to host a web portal well it depend on how mani
visitor you re expect thi can be anywher from less than number buck a month
to a coupl of dollarnumb you should checkout httpaddr or perhap amazon ecnumb
if your run someth big to unsubscrib yourself from thi mail list send an
email to emailaddr

任務(wù)二詞匯表

本部分我們利用已有的詞匯表與樣例郵件中的單詞構(gòu)建映射關(guān)系。因此，我們需要對(duì)processEmail.m文件進(jìn)行補(bǔ)充，構(gòu)建單詞表映射關(guān)系，即樣例郵件中的某一單詞屬于詞匯表，則將其添加至word_indices變量；若不屬于則跳過(guò)，驗(yàn)證下一單詞。

processEmail.m文件的參考代碼如下：

    for i = 1 : length(vocabList)
        if (strcmp(vocabList{i}, str))
            word_indices = [word_indices; i];
        end
    end

任務(wù)三提取樣例郵件的特征變量

本部分將word_indices變量中的數(shù)據(jù)轉(zhuǎn)換為特性變量x的向量模式，即

其中，若第i個(gè)單詞在樣例郵件中，則x_i = 1；否則x_i = 0。

emailFeatures.m文件中的參考代碼如下：

x(word_indices) = 1;

任務(wù)四使用支持向量機(jī)訓(xùn)練

本部分通過(guò)使用支持向量機(jī)模型對(duì)訓(xùn)練集進(jìn)行訓(xùn)練，對(duì)交叉驗(yàn)證集進(jìn)行驗(yàn)證。本部分代碼如下：

%% =========== Part 3: Train Linear SVM for Spam Classification ========
%  In this section, you will train a linear classifier to determine if an
%  email is Spam or Not-Spam.

% Load the Spam Email dataset
% You will have X, y in your environment
load('spamTrain.mat');

fprintf('\nTraining Linear SVM (Spam Classification)\n')
fprintf('(this may take 1 to 2 minutes) ...\n')

C = 0.1;
model = svmTrain(X, y, C, @linearKernel);

p = svmPredict(model, X);

fprintf('Training Accuracy: %f\n', mean(double(p == y)) * 100);

%% =================== Part 4: Test Spam Classification ================
%  After training the classifier, we can evaluate it on a test set. We have
%  included a test set in spamTest.mat

% Load the test dataset
% You will have Xtest, ytest in your environment
load('spamTest.mat');

fprintf('\nEvaluating the trained Linear SVM on a test set ...\n')

p = svmPredict(model, Xtest);

fprintf('Test Accuracy: %f\n', mean(double(p == ytest)) * 100);
pause;

任務(wù)五預(yù)測(cè)垃圾郵件

本部分使用支持向量機(jī)對(duì)新的電子郵件進(jìn)行預(yù)測(cè)。本部分代碼為：

%% =================== Part 6: Try Your Own Emails =====================
%  Now that you've trained the spam classifier, you can use it on your own
%  emails! In the starter code, we have included spamSample1.txt,
%  spamSample2.txt, emailSample1.txt and emailSample2.txt as examples. 
%  The following code reads in one of these emails and then uses your 
%  learned SVM classifier to determine whether the email is Spam or 
%  Not Spam

% Set the file to be read in (change this to spamSample2.txt,
% emailSample1.txt or emailSample2.txt to see different predictions on
% different emails types). Try your own emails as well!
filename = 'spamSample1.txt';

% Read and predict
file_contents = readFile(filename);
word_indices  = processEmail(file_contents);
x             = emailFeatures(word_indices);
p = svmPredict(model, x);

fprintf('\nProcessed %s\n\nSpam Classification: %d\n', filename, p);
fprintf('(1 indicates spam, 0 indicates not spam)\n\n');

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

編程作業(yè)（六）

編程作業(yè)（六）

支持向量機(jī)

支持向量機(jī)

垃圾郵件分類器

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

編程作業(yè)（六）

支持向量機(jī)

支持向量機(jī)

垃圾郵件分類器

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av