引言

?在scikit-learn中，樸素貝葉斯有三種方法：貝努利樸素貝葉斯（BernoulliNB）、高斯樸素貝葉斯（GaussianNB）和多項式樸素貝葉斯（MultinomialNB）。以下一一來介紹。
?樸素貝葉斯的原理可以參考我的另一篇<機器學(xué)習(xí)之樸素貝葉斯>

一、貝努利樸素貝葉斯

?貝努利樸素貝葉斯比較適合于符合貝努利分布的數(shù)據(jù)集，貝努利分布也稱為“二項分布”或者“0-1”分布。如果數(shù)據(jù)集的每個特征都只有 0，1兩種數(shù)值，那么貝努利貝葉斯的表現(xiàn)不錯，但如果是更復(fù)雜的數(shù)據(jù)，則效果一般。下面用make_blob生成樣本數(shù)未500，分類為5 的數(shù)據(jù)集。并通過圖像來了解下貝努利樸素貝葉斯的工作過程。

import numpy as np
from sklearn.datasets import make_blobs #導(dǎo)入數(shù)據(jù)集生成工具
from sklearn.model_selection import train_test_split #導(dǎo)入數(shù)據(jù)集拆分工具
from sklearn.naive_bayes import BernoulliNB #伯努利貝葉斯
import matplotlib.pyplot as plt #導(dǎo)入畫圖工具

#生成樣本數(shù)為 500， 分類數(shù)為 5 的數(shù)據(jù)集
X, y = make_blobs(n_samples=500, centers=5, random_state=8)

#拆分訓(xùn)練集和數(shù)據(jù)集
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=8)

#使用貝努利擬合數(shù)據(jù)
nb = BernoulliNB()
nb.fit(X_train, y_train)

print('代碼運行結(jié)果為：')
print('==========================')
print("訓(xùn)練數(shù)據(jù)得分：{:.2f}".format(nb.score(X_test, y_test)))

#限定橫軸和縱軸的最大值
x_min, x_max = X[:,0].min()-0.5, X[:,0].max()+0.5
y_min, y_max = X[:,1].min()-0.5, X[:,1].max()+0.5

#用不同的背景色表示不同的類
xx, yy = np.meshgrid(np.arange(x_min, x_max, .02),
                     np.arange(y_min, y_max, .02))
z = nb.predict(np.c_[(xx.ravel(), yy.ravel())]).reshape(xx.shape)
plt.pcolormesh(xx, yy, z, cmap=plt.cm.Pastel1)

#將訓(xùn)練集和測試集用散點圖表示
plt.scatter(X_train[:,0], X_train[:,1], c=y_train, cmap=plt.cm.cool, edgecolors='k')
plt.scatter(X_test[:,0], X_test[:,1], c=y_test, cmap=plt.cm.cool, edgecolors='k', marker='*')

plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())

plt.title('Classifier: BernoulliNB')
plt.show()

執(zhí)行結(jié)果如下：

代碼運行結(jié)果為：
==========================
訓(xùn)練數(shù)據(jù)得分：0.54

?可以看出貝努利樸素貝葉斯模型十分簡單，它分貝在橫軸等于0和縱軸等于0的位置畫了兩條直線，再用這兩條直線形成的4個象限對數(shù)據(jù)進行分類。這是因為代碼中使用了貝努利貝葉斯的默認參數(shù) binarize=0.0。

5.1BernoulliNB.png

二、高斯樸素貝葉斯

?高斯樸素貝葉斯比較適合于符合高斯分布的數(shù)據(jù)集，或者可以說符合高斯分布時使用的算法。

import numpy as np
from sklearn.datasets import make_blobs #導(dǎo)入數(shù)據(jù)集生成工具
from sklearn.model_selection import train_test_split #導(dǎo)入數(shù)據(jù)集拆分工具
from sklearn.naive_bayes import GaussianNB #高斯貝葉斯
import matplotlib.pyplot as plt #導(dǎo)入畫圖工具

#生成樣本數(shù)為 500， 分類數(shù)為 5 的數(shù)據(jù)集
X, y = make_blobs(n_samples=500, centers=5, random_state=8)

#拆分訓(xùn)練集和數(shù)據(jù)集
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=8)

#使用貝努力擬合數(shù)據(jù)
gnb = GaussianNB()
gnb.fit(X_train, y_train)

print('代碼運行結(jié)果為：')
print('==========================')
print("訓(xùn)練數(shù)據(jù)得分：{:.2f}".format(gnb.score(X_test, y_test)))

#限定橫軸和縱軸的最大值
x_min, x_max = X[:,0].min()-0.5, X[:,0].max()+0.5
y_min, y_max = X[:,1].min()-0.5, X[:,1].max()+0.5

#用不同的背景色表示不同的類
xx, yy = np.meshgrid(np.arange(x_min, x_max, .02),
                     np.arange(y_min, y_max, .02))
z = gnb.predict(np.c_[(xx.ravel(), yy.ravel())]).reshape(xx.shape)
plt.pcolormesh(xx, yy, z, cmap=plt.cm.Pastel1)

#將訓(xùn)練集和測試集用散點圖表示
plt.scatter(X_train[:,0], X_train[:,1], c=y_train, cmap=plt.cm.cool, edgecolors='k')
plt.scatter(X_test[:,0], X_test[:,1], c=y_test, cmap=plt.cm.cool, edgecolors='k', marker='*')

plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())

plt.title('Classifier: BernoulliNB')
plt.show()

執(zhí)行結(jié)果如下：

代碼運行結(jié)果為：
==========================
訓(xùn)練數(shù)據(jù)得分：0.97

?可以看出相比貝努利樸素貝葉斯模型，高斯貝葉斯的模型得分要好很多，而且分類邊界也復(fù)雜的多，說明手工生成的數(shù)據(jù)集的特征基本上符合正態(tài)分布。

5.2GaussianNB.png

三、多項式樸素貝葉斯

?多項式樸素貝葉斯比較適合于符合多項式分布的數(shù)據(jù)集。舉個栗子，拋硬幣是二項式分布，而擲色子是多項式分布，每次的結(jié)果是1~6中的一個，投擲n次，每個面朝上的分布情況，就是一個多項式分布。
注意：使用高斯貝葉斯時，輸入的X值必須是非負的，因此需要對數(shù)據(jù)做預(yù)處理?！?/p>

import numpy as np
from sklearn.datasets import make_blobs #導(dǎo)入數(shù)據(jù)集生成工具
from sklearn.model_selection import train_test_split #導(dǎo)入數(shù)據(jù)集拆分工具
from sklearn.preprocessing import MinMaxScaler #導(dǎo)入數(shù)據(jù)預(yù)處理工具
from sklearn.naive_bayes import MultinomialNB #多項式貝葉斯
import matplotlib.pyplot as plt #導(dǎo)入畫圖工具

#生成樣本數(shù)為 500， 分類數(shù)為 5 的數(shù)據(jù)集
X, y = make_blobs(n_samples=500, centers=5, random_state=8)

#拆分訓(xùn)練集和數(shù)據(jù)集
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=8)

#使用 MinMaxScaler 對數(shù)據(jù)進行預(yù)處理，使得數(shù)據(jù)全部為非負值
scaler = MinMaxScaler()
scaler.fit(X_train)
X_train_scalerd = scaler.transform(X_train)
X_test_scalerd = scaler.transform(X_test)


#使用貝努力擬合數(shù)據(jù)
mnb = MultinomialNB()
mnb.fit(X_train_scalerd, y_train)

print('代碼運行結(jié)果為：')
print('==========================')
print("訓(xùn)練數(shù)據(jù)得分：{:.2f}".format(mnb.score(X_test_scalerd, y_test)))

#限定橫軸和縱軸的最大值
x_min, x_max = X[:,0].min()-0.5, X[:,0].max()+0.5
y_min, y_max = X[:,1].min()-0.5, X[:,1].max()+0.5

#用不同的背景色表示不同的類
xx, yy = np.meshgrid(np.arange(x_min, x_max, .02),
                     np.arange(y_min, y_max, .02))
z = mnb.predict(np.c_[(xx.ravel(), yy.ravel())]).reshape(xx.shape)
plt.pcolormesh(xx, yy, z, cmap=plt.cm.Pastel1)

#將訓(xùn)練集和測試集用散點圖表示
plt.scatter(X_train[:,0], X_train[:,1], c=y_train, cmap=plt.cm.cool, edgecolors='k')
plt.scatter(X_test[:,0], X_test[:,1], c=y_test, cmap=plt.cm.cool, edgecolors='k', marker='*')

plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())

plt.title('Classifier: MultinomialNB')
plt.show()

注釋：

sklearn.preprocessing.MinMaxScaler 用來轉(zhuǎn)換Vector行的數(shù)據(jù)集，將每個要素重新縮放到特定范圍（通常為[0，1]）

執(zhí)行結(jié)果如下：

代碼運行結(jié)果為：
==========================
訓(xùn)練數(shù)據(jù)得分：0.32

?可以看出多項式樸素貝葉斯模型分類效果差。說明并不適合這個數(shù)據(jù)集。多項式樸素貝葉斯適合用來對非負離散數(shù)值特征進行分類，典型的例子時對轉(zhuǎn)化為向量后的文本數(shù)據(jù) 進行分類。

5.3MultinomialNB.png

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

Python機器學(xué)習(xí)之樸素貝葉斯

Python機器學(xué)習(xí)之樸素貝葉斯

引言

一、貝努利樸素貝葉斯

二、高斯樸素貝葉斯

三、多項式樸素貝葉斯

sklearn.preprocessing.MinMaxScaler 用來轉(zhuǎn)換Vector行的數(shù)據(jù)集，將每個要素重新縮放到特定范圍（通常為[0，1]）

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

Python機器學(xué)習(xí)之樸素貝葉斯

引言

一、貝努利樸素貝葉斯

二、高斯樸素貝葉斯

三、多項式樸素貝葉斯

sklearn.preprocessing.MinMaxScaler 用來轉(zhuǎn)換Vector行的數(shù)據(jù)集，將每個要素重新縮放到特定范圍（通常為[0，1]）

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

一、貝努利樸素貝葉斯

sklearn.preprocessing.MinMaxScaler 用來轉(zhuǎn)換Vector行的數(shù)據(jù)集，將每個要素重新縮放到特定范圍（通常為[0，1]）