尤物视频网址,91综合色五月,日韩精品1234

機(jī)器學(xué)習(xí)實(shí)戰(zhàn) http://www.ituring.com.cn/book/1021

簡介

工作原理：存在一個(gè)樣本數(shù)據(jù)集，并且樣本集中每個(gè)數(shù)據(jù)都存在標(biāo)簽（樣本集中每一數(shù)據(jù)與所屬分類的對應(yīng)關(guān)系）。輸入沒有標(biāo)簽的新數(shù)據(jù)后，將新數(shù)據(jù)的每個(gè)特征與樣本集中數(shù)據(jù)對應(yīng)的特征進(jìn)行比較，然后算法提取樣本集中特征最相似數(shù)據(jù)（最鄰近）的分類標(biāo)簽。一般來說，選擇樣本數(shù)據(jù)集中前k個(gè)最相似的數(shù)據(jù)，最后選擇k個(gè)最相似數(shù)據(jù)中出現(xiàn)次數(shù)最多的分類，作為新數(shù)據(jù)的分類。

例如：

電影	打斗次數(shù)	接吻次數(shù)	電影類型
California Man	3	104	Romance
He's Not Really into Dudes	2	100	Romance
Beautiful Woman	1	81	Romance
Kevin Longblade	101	10	Action
Robo Slayer 3000	99	5	Action
Amped II	98	2	Action
未知	18	90	Unknown

已知電影與未知電影的距離：

電影名稱	與未知電影的距離
California Man	20.5
He's Not Really into Dudes	18.7
Beautiful Woman	19.2
Kevin Longblade	115.3
Robo Slayer 3000	117.4
Amped II	118.9

按照距離遞增排序可以找到k個(gè)距離最近的電影，假設(shè)k=3，取最近三部電影的類型可知未知電影也是一部愛情片。

算法步驟

1）計(jì)算測試數(shù)據(jù)與各個(gè)訓(xùn)練數(shù)據(jù)之間的距離；
2）按照距離的遞增關(guān)系進(jìn)行排序；
3）選取距離最小的K個(gè)點(diǎn)；
4）確定前K個(gè)點(diǎn)所在類別的出現(xiàn)頻率；
5）返回前K個(gè)點(diǎn)中出現(xiàn)頻率最高的類別作為測試數(shù)據(jù)的預(yù)測分類

import numpy as np
import operator
##給出訓(xùn)練數(shù)據(jù)以及對應(yīng)的類別
def createDataSet():
    group = np.array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])
    labels = ['A','A','B','B']
    return group,labels

###通過KNN進(jìn)行分類
def classify0(inX,dataSet,labels,k):
    dataSetSize = dataSet.shape[0]
    # print(dataSetSize)
    # print(np.tile(inX,(dataSetSize,1)))
    ###計(jì)算距離
    diffMat = np.tile(inX,(dataSetSize,1)) - dataSet
    # print(diffMat)
    sqDiffMat = diffMat**2
    # print(sqDiffMat)
    sqDistances = sqDiffMat.sum(axis=1)
    # print(sqDistances)
    distances = sqDistances**0.5
    print('distances:',distances)
    #根據(jù)元素的值從大到小對元素進(jìn)行排序，返回下標(biāo)
    sortedDistIndicies = distances.argsort()
    # print('sortedDistIndicies:',sortedDistIndicies)
    classCount = {}
    for i in range(k):
        voteIlabel = labels[sortedDistIndicies[i]]
        classCount[voteIlabel] = classCount.get(voteIlabel,0)+1
    #選取出最多的類別
    sortedClassCount = sorted(classCount.items(),key=operator.itemgetter(1),reverse=True)
    return sortedClassCount[0][0]

if __name__ == '__main__':
    group, labels = createDataSet()
    input = [0.5,0.5]
    output = classify0(input,group,labels,3)
    print("測試數(shù)據(jù)為:",input,"分類結(jié)果為：",output)
>>>distances: [ 0.78102497  0.70710678  0.70710678  0.64031242]
>>>測試數(shù)據(jù)為: [0.5, 0.5] 分類結(jié)果為： B

歸一化數(shù)值

在計(jì)算測試樣本和樣本集中數(shù)據(jù)距離的時(shí)候，有些特征的差值較大，對計(jì)算結(jié)果有較大影響。在處理這種不同取值范圍的特征值時(shí)，我們通常采用的方法就是將數(shù)值歸一化，如將取值范圍處理為0到1或者-1到1之間。

newValue = (oldValue-min)/(max-min)

def autoNum(dataSet):
    minVals = dataSet.min(0)
    maxVals = dataSet.max(0)
    ranges = maxVals - minVals
    normDataSet = np.zeros(np.shape(dataSet))
    m = dataSet.shape[0]
    normDataSet = dataSet - np.tile(minVals,(m,1))
    normDataSet = normDataSet/np.tile(ranges,(m,1))
    return normDataSet,ranges,minVals

測試算法

'''
將文本記錄裝換為NumPy的解析程序
'''
from collections import Counter
def file2matrix(filename):
    fr = open(filename)
    arrayOLines = fr.readlines()
    numberOfLines = len(arrayOLines)
    returnMat = np.zeros((numberOfLines,3))
    classLabelVector = []
    index = 0
    for line in arrayOLines:
        line = line.strip()
        listFormLine = line.split('\t')
        returnMat[index,:] = listFormLine[0:3]
        classLabelVector.append(listFormLine[-1])
        index += 1
    dictClassLabel = Counter(classLabelVector)
    classLabel = []
    kind = list(dictClassLabel)
    for item in classLabelVector:
        if item == kind[0]:
            item = 1
        elif item == kind[1]:
            item = 2
        else:
            item = 3
        classLabel.append(item)
    return returnMat,classLabel

def datingClassTest():
    hoRatio = 0.1  #選取10%測試,剩下的是已知數(shù)據(jù)集
    datingDataMat,datingLabels = file2matrix('datingTestSet.txt')
    normMat,ranges,minVals = autoNum(datingDataMat)
    m = normMat.shape[0]
    numTestVecs = int(m*hoRatio)
    errorCount = 0.0
    for i in range(numTestVecs):
        classifierResult = classify0(normMat[i,:],normMat[numTestVecs:m,:],datingLabels[numTestVecs:m],3)
        print("the classifier came back with %d, the real answer is %d"%(classifierResult,datingLabels[i]))
        if(classifierResult != datingLabels[i]):errorCount += 1.0
    print("the total error rate is %f" %(errorCount/float(numTestVecs)))

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

k-近鄰算法

k-近鄰算法

簡介

算法步驟

歸一化數(shù)值

測試算法

算法的scikit-learn實(shí)現(xiàn)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

k-近鄰算法

簡介

算法步驟

歸一化數(shù)值

測試算法

算法的scikit-learn實(shí)現(xiàn)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av