KNN是將樣本歸為距樣本最鄰近的k個樣本中大多數(shù)所屬的類別。
算法實(shí)現(xiàn):
1、計算出每一個樣本點(diǎn)與測試點(diǎn)的距離
2、選取距離最近的K個樣本,并獲取他們的標(biāo)簽 label
3、然后找出K個樣本中數(shù)量最多的標(biāo)簽,返回該標(biāo)簽
- 導(dǎo)入所需要的庫
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
- 打開其中一個觀察特征
one = plt.imread(r'.\機(jī)器學(xué)習(xí)數(shù)據(jù)分析源碼\day8-KNN算法\1-KNN\exercise\data\0\0_10.bmp')
print(one)
plt.imshow(one,cmap='gray')

- 讀取數(shù)據(jù),文件名稱在存儲上有一定規(guī)律,可以使用循環(huán)來快速讀取文件并將數(shù)據(jù)存放在列表中
x_train = []
x_test = []
y_train = []
y_test = []
for i in range(10):
for j in range(1,501):
if j < 451: #保存到訓(xùn)練數(shù)據(jù)
x_train.append(plt.imread(r'C:\Users\‘\Desktop\python\數(shù)據(jù)挖掘\機(jī)器學(xué)習(xí)數(shù)據(jù)分析源碼\day8-KNN算法\1-KNN\exercise\data\{}\{}_{}.bmp'.format(i,i,j)).reshape(-1))
y_train.append(i)
else : #保存到測試數(shù)據(jù)
x_test.append(plt.imread(r'C:\Users\‘\Desktop\python\數(shù)據(jù)挖掘\機(jī)器學(xué)習(xí)數(shù)據(jù)分析源碼\day8-KNN算法\1-KNN\exercise\data\{}\{}_{}.bmp'.format(i,i,j)).reshape(-1))
y_test.append(i)
- 數(shù)據(jù)轉(zhuǎn)換
x_train,y_train,x_test,y_test = np.array(x_train),np.array(y_train),np.array(x_test),np.array(y_test)
- 實(shí)例化KNN算法,并訓(xùn)練
knn = KNeighborsClassifier()
knn.fit(x_train,y_train)
- 預(yù)測并查看測試結(jié)果
y_predict = knn.predict(x_test)
print(y_predict)
- 用matplotlib畫出圖像并且列出真實(shí)值
plt.figure(figsize=(12,15))
im_data = x_test[::20]
im_target = y_test[::20]
im_predict = y_predict[::20]
for i in range(25):
plt.subplot(5,5,(i+1))
plt.imshow(im_data[i].reshape((28,28)))
plt.title('predict:%d'%im_predict[i]+'\ntrue:%d'%(im_target[i]))
