Python機(jī)器學(xué)習(xí)-決策樹(shù)的構(gòu)建
決策樹(shù)(ID3算法)
開(kāi)發(fā)環(huán)境為anaconda中的spyder,所有庫(kù)已經(jīng)默認(rèn)安裝,若使用其它環(huán)境需要安裝外部庫(kù),主要代碼如下:
# -*- coding: utf-8 -*-
"""
Created on Tue Aug 1 16:09:50 2017
@author: Administrator
"""
# 用來(lái)提供數(shù)據(jù)轉(zhuǎn)換
from sklearn.feature_extraction import DictVectorizer
# 處理csv文件
import csv
from sklearn import preprocessing
# 決策樹(shù)算法
from sklearn import tree
from sklearn.externals.six import StringIO
allElectronicsData = open(r'D:/data_com.csv')
read = csv.reader(allElectronicsData)
headers = next(read)
featureList = []
labelList = []
for row in read:
labelList.append(row[-1])
rowDict = {}
# 從第二列到倒數(shù)第二列
for i in range(1,len(row)-1):
rowDict[headers[i]] = row[i]
featureList.append(rowDict)
# print(featureList)
# 創(chuàng)建一個(gè)dummyX 001:youth 010:sensior 100:middle_aged 等等
vec = DictVectorizer()
dummyX = vec.fit_transform(featureList).toarray()
lb = preprocessing.LabelBinarizer()
dummyY = lb.fit_transform(labelList)
# print(dummyX)
# print(vec)
# print(dummyY)
clf = tree.DecisionTreeClassifier(criterion = 'entropy')
clf = clf.fit(dummyX,dummyY)
with open('D:/allElectronicInformationGainOri.dot','w') as f:
f = tree.export_graphviz(clf,feature_names=vec.get_feature_names(),out_file=f)
輸出的dot文件可以使用graphvize軟件轉(zhuǎn)為PDF,graphvize安裝目錄中的bin目錄放入到環(huán)境變量的Path中
使用如下命令
dot -Tpdf xx.dot -o xx.pdf
運(yùn)行完結(jié)果如下:

決策樹(shù)的構(gòu)建.png