ML: KNN筆記

使用Jupyter notebook

%matplotlib qt
import numpy as np
from sklearn import metrics
from sklearn.neighbors import KNeighborsClassifier
  1. 讀取txt數(shù)據(jù),最后一列為標(biāo)簽
data = []
labels = []
with open('data\\datingTestSet.txt') as f:
    for line in f:
        tokens = line.strip().split('\t')
        data.append([float(tk) for tk in tokens[:-1]])
        labels.append(tokens[-1])

data[1:10]
np.unique(labels)
array(['didntLike', 'largeDoses', 'smallDoses'],
dtype='|S10')

  1. 處理字符標(biāo)簽為數(shù)字標(biāo)簽
x = np.array(data)
labels = np.array(labels)
y = np.zeros(labels.shape)
y[labels=='didntLike'] = 1
y[labels=='smallDoses'] = 2
y[labels=='largeDoses'] = 3
  1. 數(shù)據(jù)未歸一化前
model = KNeighborsClassifier(n_neighbors=3)
model.fit(x,y)
print(model)
expected = y
predicted = model.predict(x)
print metrics.classification_report(expected,predicted,target_names=['didntLike','smallDoses','largeDoses'])
print metrics.confusion_matrix(expected,predicted)

結(jié)果:

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=1, n_neighbors=3, p=2,
weights='uniform')
precision recall f1-score support

didntLike 0.89 0.85 0.87 342
smallDoses 0.93 0.98 0.96 331
largeDoses 0.82 0.83 0.82 327

avg / total 0.88 0.88 0.88 1000

[[289 0 53]
[ 1 325 5]
[ 33 24 270]]

  1. 數(shù)據(jù)歸一化到[0-1范圍]
from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler()
X_train_minmax = min_max_scaler.fit_transform(x)
X_train_minmax
array([[ 0.44832535,  0.39805139,  0.56233353],
       [ 0.15873259,  0.34195467,  0.98724416],
       [ 0.28542943,  0.06892523,  0.47449629],
       ..., 
       [ 0.29115949,  0.50910294,  0.51079493],
       [ 0.52711097,  0.43665451,  0.4290048 ],
       [ 0.47940793,  0.3768091 ,  0.78571804]])
  1. 拆分訓(xùn)練數(shù)據(jù)與測(cè)試數(shù)據(jù)
from sklearn.cross_validation import train_test_split  
''''' 拆分訓(xùn)練數(shù)據(jù)與測(cè)試數(shù)據(jù) '''  
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2)  
  1. 歸一化后結(jié)果
    n_neighbors = 3 K近鄰的K取值為3
x_train, x_test, y_train, y_test = train_test_split(X_train_minmax, y, test_size = 0.2)  
model = KNeighborsClassifier(n_neighbors=3)
model.fit(x_train,y_train)
print(model)
expected = y_test
predicted = model.predict(x_test)
print metrics.classification_report(expected,predicted,target_names=['didntLike','smallDoses','largeDoses'])
print metrics.confusion_matrix(expected,predicted)

結(jié)果:

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=1, n_neighbors=3, p=2,
weights='uniform')
precision recall f1-score support

didntLike 0.97 1.00 0.99 68
smallDoses 0.93 1.00 0.96 51
largeDoses 1.00 0.93 0.96 81

avg / total 0.97 0.97 0.97 200

[[68 0 0]
[ 0 51 0]
[ 2 4 75]]

小結(jié):
歸一化后的結(jié)果,與歸一化前相差很大

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容