2019-08-23數(shù)據(jù)清洗作業(yè)

'''
作業(yè)要求:
1、成功讀取“商鋪數(shù)據(jù).csv”文件
2、解析數(shù)據(jù),存成列表字典格式:[{'var1':value1,'var2':value2,'var3':values,...},...,{}]
3、數(shù)據(jù)清洗:
① comment,price兩個(gè)字段清洗成數(shù)字
② 清除字段缺失的數(shù)據(jù)
③ commentlist拆分成三個(gè)字段,并且清洗成數(shù)字
4、結(jié)果存為.pkl文件

'''
import numpy as np
import pandas as pd

shop = pd.read_csv(r'C:\Users\heart\Documents\Tencent Files\592409588\FileRecv\【非常重要】課程資料\CLASSDATA_ch01數(shù)據(jù)思維導(dǎo)論:如何從數(shù)據(jù)中挖掘價(jià)值?\CLASSDATA_ch01數(shù)據(jù)思維導(dǎo)論:如何從數(shù)據(jù)中挖掘價(jià)值?\CLASSDATA_ch01數(shù)據(jù)思維導(dǎo)論:如何從數(shù)據(jù)中挖掘價(jià)值?\練習(xí)01_商鋪數(shù)據(jù)加載及存儲(chǔ)_資料\商鋪數(shù)據(jù).csv'
                   ,engine='python'
                   ,sep=','
                   ,header=1
                   ,encoding='utf8'
                   ,names=['classify','name','comment','star','price','address','commentlist'])

shop.head()
shc = list(shop.columns)
shc
list(shop['classify'].values)

#先構(gòu)建空列表、空字典 ,使用雙層遍歷循環(huán),進(jìn)行添加
lst = []
dic = {}

for col in shc:
    values = list(shop[col].values)
    
    for value in values:
        dic[col] = value
        lst.append(dic)
    
len(lst)
lst[300]

#清洗數(shù)據(jù),去除值中的文本字符
shop['comment'][1]
shop['comment'] = shop['comment'].str.split('                    ')[0][0]
#shop['comment'].str.split('                    ')[0][0]
shop['price'] = shop['price'].str.split('                                        ¥')
shop['price'] = shop['price'][1][1]
shop['price'].astype('float64')

#commentlist拆分成三個(gè)字段,并且清洗成數(shù)字
shop['commentlist'] = shop['commentlist'].str.split('                                ')
shop['commentlist_zl'] = shop['commentlist'][0][0][2:]
shop['commentlist_hj'] = shop['commentlist'][0][1][2:]
shop['commentlist_hw'] = shop['commentlist'][0][2][2:]

shop.head()

#結(jié)果存為.pkl文件
import pickle 

#shop.to_csv('shop.csv',index=False,sep=',')

with open('shop.txt','wb') as f:
    pickle.dump(shop,f)    #寫入

with open('shop.pkl','rb') as fo:
    data1 = pickle.load(fo,encoding='byes')   #讀取
    
data1.head()
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

友情鏈接更多精彩內(nèi)容