pandas實(shí)戰(zhàn)

一、pandas讀取csv文件

import pandas as pd
f = open("C:\\Users\\matrix\\lz_business_advisor_dashboard_key_metrics_source.csv", encoding = 'utf-8')
pd.read_csv(f)

參考
【1】詳解pandas的read_csv方法:https://blog.csdn.net/weixin_37706204/article/details/120827141

二、pandas讀取excel文件

import pandas as pd
data = pd.read_excel("C:\\Users\\matrix\\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, index_col = 0, nrows = 3)
data

其中,
sheet_name:第幾個(gè)sheet頁,是從0開始的
header:第幾行是header
index_col :其實(shí)列
nrows:取多少行

三、pandas在指定列添加一列

import pandas as pd
df = pd.read_excel("C:\\Users\\matrix\\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5,  nrows = 3)
df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
df

四、pandas寫入excel文件

import pandas as pd
df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5,  nrows = 3)
df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
df.to_excel(r"C:\Users\matrix\test.xlsx")

五、pandas寫入數(shù)據(jù)庫

import pandas as pd
import pymysql
from sqlalchemy import create_engine
df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5,  nrows = 3)
df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
# create conn
conn = create_engine('mysql+pymysql://root:123456@localhost:3306/test', encoding = 'utf8')
# write db
insert_rows = pd.io.sql.to_sql(df, "test_table", conn, if_exists = 'replace')
print(insert_rows)
# df.to_excel(r"C:\Users\matrix\test.xlsx")

參考:pandas 寫入mysql數(shù)據(jù)庫.to_sql方法詳解

六、pandas處理表頭

import pandas as pd
import pymysql
from sqlalchemy import create_engine
df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5,  nrows = 3)
df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
df = pd.DataFrame(df, columns = ['date_of_data','Date'])
df.columns = (['date_of_data', 'date'])
df

七、pandas正則

replace方法不支持正則,需要用sub方法
demo: 將所有表頭除了數(shù)字、字母和下劃線以外的字符全部替換為下劃線,并且全部字符小寫

import pandas as pd
import re
df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, nrows = 3)
df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
# df = pd.DataFrame(df, columns = ['date_of_data','Date'])
field_list = []
# regex1 = re.compile(r'[^A-Za-z0-9]')
# regex2 = re.compile(r'_+')
for field in df.columns:
    # field = field.replace(' ', '')
    # field = regex1.sub('_', field)
    # transform except 'A-Za-z0-9' char to '_'
    field = re.sub(r'[^A-Za-z0-9]', r'_', field)
    # transform multiple '_' char to '_'
    field = re.sub(r'_+', r'_', field)
    # lower all char
    field = field.lower()
    # field = regex2.sub('#', field)
    field_list.append(field)
df.columns = tuple(field_list)
df

八、pandas將所有類改為str類型

import pandas as pd
df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, nrows = 3)
df[:] = df[:].astype(str)
df.dtypes

參考:在Pandas中更改列的數(shù)據(jù)類型

九、pandas寫入數(shù)據(jù)庫列類型

insert_rows = pd.io.sql.to_sql(df, "test_table", conn, if_exists = 'replace', index=False,
                               dtype={col_name: VARCHAR(500) for col_name in df}
                              )
import pandas as pd
import pymysql
import sqlalchemy
from sqlalchemy import create_engine
import re
from sqlalchemy.types import VARCHAR 
df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, nrows = 3)
df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
# df = pd.DataFrame(df, columns = ['date_of_data','Date'])
field_list = []
# regex1 = re.compile(r'[^A-Za-z0-9]')
# regex2 = re.compile(r'_+')
for field in df.columns:
    # field = field.replace(' ', '')
    # field = regex1.sub('_', field)
    # transform except 'A-Za-z0-9' char to '_'
    field = re.sub(r'[^A-Za-z0-9]', r'_', field)
    # transform multiple '_' char to '_'
    field = re.sub(r'_+', r'_', field)
    # lower all char
    field = field.lower()
    # field = regex2.sub('#', field)
    field_list.append(field)
df.columns = tuple(field_list)
# transform all columns to str type, str map to db text
# df[:] = df[:].astype(str)
# df.dtypes
# df[field_list]
# create conn
conn_string = 'mysql+pymysql://root:123456@localhost:3306/test'
conn = create_engine(conn_string, encoding = 'utf8')
# write db
insert_rows = pd.io.sql.to_sql(df, "test_table", conn, if_exists = 'replace', index=False,
                               dtype={col_name: VARCHAR(500) for col_name in df}
                              )
print(insert_rows)

參考:
[1] Pandas to_sql將列類型從varchar更改為text
[2] pandas to_sql all columns as nvarchar (可行方案)

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容