国产日韩三级,99日99干99色,www.99极品视频

知識基礎(chǔ)

Pandas包基礎(chǔ):pd.read_csv
正則表達(dá)式基礎(chǔ)

報告自動化對數(shù)據(jù)的質(zhì)量有著更高的要求，但是實際情況中出現(xiàn)錯漏是非常正常的，而我們不僅僅應(yīng)該在出現(xiàn)問題后修復(fù)bug，在最開始就應(yīng)該做好盡可能嚴(yán)格的規(guī)定并作出意外情況的報告和處理。

讀取CSV文件

csv文件是我們常用的數(shù)據(jù)源，在此我們以csv文件為例

首先我們可以查看要讀取數(shù)據(jù)內(nèi)容

import pandas as pd
import numpy as np

# 可以發(fā)現(xiàn)第8行才是頭部，于是設(shè)置header參數(shù)
data = pd.read_csv('data.csv', header=7, index_col=0)
data.head()

對讀取目標(biāo)列進(jìn)行格式規(guī)定

data.dtypes

Product Name    object
Brand           object
Price           object
Category        object
Rank            object
Sales           object
Revenue         object
Reviews          int64
Rating          object
Seller          object
LQS             object
ASIN            object
Link            object
dtype: object

可以看到在列：Price, Rank, Sales, Revenue, Reviews, Rating, LQS都應(yīng)該是數(shù)值，但是只有Review列被默認(rèn)讀取為數(shù)值

使用dtype進(jìn)行格式規(guī)定

dtype = {'#':int,
         'Product Name':str,
         'Brand':str,
         'Price':float,
         'Category':str,
         'Rank':int,
         'Sales':int,
         'Revenue':int,
         'Reviews':int,
         'Rating':float,
         'Seller':str,
         'LQS':int,
         'ASIN':str,
         'Link':str
        }
try:
    data = pd.read_csv('data.csv', dtype=dtype, header=7, index_col=0)
except BaseException as e:
    print(e)

invalid literal for int() with base 10: '1,067'

可以看到使用dtype并不能直接忽略非數(shù)字符號進(jìn)行轉(zhuǎn)換，我們需要更強的格式規(guī)定

使用converters進(jìn)行格式轉(zhuǎn)化

import re
# 使用正則表達(dá)式進(jìn)行數(shù)字提取
def str2num(string):
    if not isinstance(string, str):
        string = str(string)
    string = string.replace(',','')
    regular_expression = '\d+\.?\d*'
    pattern = re.compile(regular_expression)
    match = pattern.search(string)
    if match:
        return float(match.group())
    else:
        return float('nan')
converters = {'Price':str2num,
              'Rank':str2num,
              'Rating':str2num,
              'Sales':str2num,
              'Revenue':str2num,
              'Reviews':str2num
             }
try:
    data = pd.read_csv('data.csv', converters=converters, header=7, index_col=0)
except BaseException as e:
    print(e)
data.head()

把不同的數(shù)據(jù)處理函數(shù)解耦，分別把str2num放入tools模塊，數(shù)據(jù)讀取放入datapipeline模塊

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

7.1 數(shù)據(jù)處理 : 數(shù)據(jù)的讀取

7.1 數(shù)據(jù)處理 : 數(shù)據(jù)的讀取

知識基礎(chǔ)

讀取CSV文件

首先我們可以查看要讀取數(shù)據(jù)內(nèi)容

對讀取目標(biāo)列進(jìn)行格式規(guī)定

使用dtype進(jìn)行格式規(guī)定

使用converters進(jìn)行格式轉(zhuǎn)化

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

7.1 數(shù)據(jù)處理 : 數(shù)據(jù)的讀取

知識基礎(chǔ)

讀取CSV文件

首先我們可以查看要讀取數(shù)據(jù)內(nèi)容

對讀取目標(biāo)列進(jìn)行格式規(guī)定

使用dtype進(jìn)行格式規(guī)定

使用converters進(jìn)行格式轉(zhuǎn)化

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av