參考資料

<PYTHON_MACHINE_LEARNING> chapter3
A Tour of Machine Learning
Classifers Using Scikit-learn

引言

在本章節(jié)中，我們將接觸一些常用的機(jī)器學(xué)習(xí)算法，了解這些監(jiān)督學(xué)習(xí)式分類算法的優(yōu)缺點(diǎn)，并且用 python 的 Scikit-learn 庫來進(jìn)行搭建
在面對具體問題的時候，我們并不能保證某一類算法是永遠(yuǎn)是最好的，沒有一種單一的算法能夠完美匹配所有的情況，這與樣本中特征值，數(shù)據(jù)集，以及分類是否線性分離都有關(guān)系，一般的，我們大概要遵循以下五個流程：

1 Selection of features.
2 Choosing a performance metric(標(biāo)準(zhǔn)).

3 Choosing a classifer and optimization algorithm.
4 Evaluating the performance of the model.
5 Tuning the algorithm.

寫在前面，關(guān)于新的函數(shù)

numpy.unique() 只接受數(shù)組（一維情況可以等價于列表），不接受列表

 import numpy as np
A = [1, 2, 2, 3, 4, 3]
a = np.unique(A)
print(a)            # 輸出為 [1 2 3 4]
a, b, c = np.unique(A, return_index=True, return_inverse=True)
print(a, b, c)      # 輸出為 [1 2 3 4], [0 1 3 4], [0 1 1 2 3 2]

sklearn.model_selection.train_test_split 隨機(jī)劃分訓(xùn)練集和測試集

from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
'''
一般形式：
train_test_split是交叉驗(yàn)證中常用的函數(shù)，功能是從樣本中隨機(jī)的按比例選取train data
和testdata，形式為：
X_train,X_test, y_train, y_test =
cross_validation.train_test_split(train_data,train_target,test_size=0.4, random_state=0)
參數(shù)解釋：
train_data：所要劃分的樣本特征集
train_target：所要劃分的樣本結(jié)果
test_size：樣本占比，如果是整數(shù)的話就是樣本的數(shù)量
random_state：是隨機(jī)數(shù)的種子。
隨機(jī)數(shù)種子：其實(shí)就是該組隨機(jī)數(shù)的編號，在需要重復(fù)試驗(yàn)的時候，保證得到一組一樣的隨機(jī)數(shù)。
比如你每次都填1，其他參數(shù)一樣的情況下你得到的隨機(jī)數(shù)組是一樣的。但填0或不填，每次都會不一樣。
隨機(jī)數(shù)的產(chǎn)生取決于種子，隨機(jī)數(shù)和種子之間的關(guān)系遵從以下兩個規(guī)則：
種子不同，產(chǎn)生不同的隨機(jī)數(shù)；種子相同，即使實(shí)例不同也產(chǎn)生相同的隨機(jī)數(shù)。
'''

StandardScaler 標(biāo)準(zhǔn)化特征值

 from sklearn.preprocessing import StandardScaler
 sc = StandardScaler()
 sc.fit(X_train)#計算均值跟標(biāo)準(zhǔn)差
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)

Perceptron 類依靠 One-vs.-Rest 方法進(jìn)行多分類

from sklearn.linear_model import Perceptron
ppn = Perceptron(n_iter=40, eta0=0.1, random_state=0)
# n_iter 迭代數(shù) eta0 學(xué)習(xí)速率(需要不斷測試) random_state用于每次迭代開始的時候打亂數(shù)據(jù)集
ppn.fit(X_train_std, y_train)

Perceptron類中的 predict 方法：實(shí)現(xiàn)預(yù)測

y_pred = ppn.predict(X_test_std)
num = 0
for i in range(len(y_pred)):
    if y_pred[i] != y_test[i]:
        num += 1
print('Misclassified samples: %d' % num)

使用sclearn庫中現(xiàn)有的 iris 數(shù)據(jù)集再現(xiàn)感知機(jī)模型

導(dǎo)入數(shù)據(jù)集

from sklearn import datasets
iris = datasets.load_iris()
x = iris.data[:,[2,3]] #提取每一行中的第2，3列
y = iris.target#獲得相應(yīng)的y

使用剛剛講到的幾個函數(shù)，我們可以重現(xiàn)chapter2 中的感知機(jī)

__author__ = 'Administrator'
#! /usr/bin/python <br> # -*- coding: utf8 -*-
from sklearn import datasets
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Perceptron
from sklearn.metrics import accuracy_score
from PDC import plot_decision_regions
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
import numpy as np
iris = datasets.load_iris()
x = iris.data[:,[2,3]]
y = iris.target
X_train,X_test,y_train,y_test = train_test_split(
    x , y, test_size=0.3, random_state = 0
)
sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)
ppn = Perceptron(n_iter=80,eta0=0.01,random_state=0)
ppn.fit(X_train_std,y_train)
y_pred = ppn.predict(X_test_std)
num = 0
for i in range(len(y_pred)):
    if y_pred[i] != y_test[i]:
        num += 1
print('Misclassified samples: %d' % num)
print('Accuracy：%.2f'% accuracy_score(y_test,y_pred))
X_combined_std = np.vstack((X_train_std,X_test_std))
y_combined = np.hstack((y_train,y_test))
plot_decision_regions(X=X_combined_std,y=y_combined,
                      classifier=ppn,
                      test_idx=range(105,150))
plt.xlabel('petal length [standardized]')
plt.ylabel('petal width [standardized]')
plt.legend(loc='upper left')
plt.savefig('Iris.png')
plt.show()

在可視化的時候，我們引用了之前寫好的PDC.py 中的 plot_decision_regions 函數(shù)
這里，我們需要在函數(shù)中加一個功能，使其能夠高亮顯示測試集數(shù)據(jù)

#! usr/bin/python <br> # -*- coding:utf8 -*-
from matplotlib.colors import ListedColormap
import matplotlib.pyplot as plt
import numpy as np
def plot_decision_regions(X, y, classifier,test_idx = None, resolution=0.02):
    #setup marker generator and colormap
    markers = ('s','x','o','^','v')
    colors = ('red','blue','lightgreen','gray','cyan')
    cmap = ListedColormap(colors[: len(np.unique(y))])
    # plot the decision surface
    x1_min, x1_max = X[:,0].min() -1, X[:,0].max()+1
    x2_min, x2_max = X[:,1].min() -1, X[:,1].max()+1
    # X[:,k] 冒號左邊表示行范圍，讀取所有行，冒號右邊表示列范圍，讀取第K列
    xx1, xx2 = np.meshgrid(np.arange(x1_min,x1_max,resolution),
                           np.arange(x2_min,x2_max,resolution))
    #arange(start,end,step) 返回一個一維數(shù)組
    #meshgrid(x,y)產(chǎn)生一個以x為行，y為列的矩陣
    #xx1是一個(305*235)大小的矩陣 xx1.ravel()是將所有的行放在一個行里面的長度71675的一維數(shù)組
    #xx2同理
    Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T)
    #np.array([xx1.ravel(), xx2.ravel()]) 生成了一個 (2*71675)的矩陣
    # xx1.ravel() = (1,71675)
    #xx1.shape = (305,205) 將Z重新調(diào)整為(305,205)的格式
    Z = Z.reshape(xx1.shape)

    plt.contourf(xx1, xx2, Z, alpha=0.4, cmap=cmap)

    plt.xlim(xx1.min(), xx1.max())
    plt.ylim(xx2.min(), xx2.max())

    # plot class samples
    print(np.unique(y))
    # idx = 0,1 cl = -1 1
    for idx, cl in enumerate(np.unique(y)):
        plt.scatter(x=X[y==cl, 0], y=X[y==cl, 1],
                    alpha=0.8, c=cmap(idx),
                    marker = markers[idx],label = cl)
    #highlight test samples   
    #增加的模塊
    if test_idx:
        X_test, y_test = X[test_idx,:],y[test_idx]
        plt.scatter(X_test[:,0],X_test[:,1],c='',edgecolors='0',
                    alpha=1.0, linewidths=1,marker='o',
                    s=55, label='test set')

貼一下結(jié)果吧

Iris.png

結(jié)論正如第二章里講到的，感知機(jī)對線性不理想分離的數(shù)據(jù)不收斂，無論怎樣增加迭代次數(shù)，都存在著誤差

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

菜鳥筆記Python3——機(jī)器學(xué)習(xí)(二) Scikit-learn

菜鳥筆記Python3——機(jī)器學(xué)習(xí)(二) Scikit-learn

參考資料

引言

寫在前面，關(guān)于新的函數(shù)

使用sclearn庫中現(xiàn)有的 iris 數(shù)據(jù)集再現(xiàn)感知機(jī)模型

結(jié)論正如第二章里講到的，感知機(jī)對線性不理想分離的數(shù)據(jù)不收斂，無論怎樣增加迭代次數(shù)，都存在著誤差

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

菜鳥筆記Python3——機(jī)器學(xué)習(xí)(二) Scikit-learn

參考資料

引言

寫在前面，關(guān)于新的函數(shù)

使用sclearn庫中現(xiàn)有的 iris 數(shù)據(jù)集再現(xiàn)感知機(jī)模型

結(jié)論 正如第二章里講到的，感知機(jī)對線性不理想分離的數(shù)據(jù)不收斂，無論怎樣增加迭代次數(shù)，都存在著誤差

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

結(jié)論正如第二章里講到的，感知機(jī)對線性不理想分離的數(shù)據(jù)不收斂，無論怎樣增加迭代次數(shù)，都存在著誤差