Working with Python Pandas and XlsxWriter

Python Pandas是一個(gè)Python數(shù)據(jù)分析庫(kù)。它可以讀取,過(guò)濾和重新排列大小數(shù)據(jù)集,并以包括Excel在內(nèi)的多種格式輸出它們。

Pandas使用Xlwt模塊(用于xls文件)和Openpyxl或XlsxWriter模塊(用于xlsx文件)寫(xiě)入Excel 文件。

將XlsxWriter與Pandas一起使用

要將XlsxWriter與Pandas一起使用,請(qǐng)將其指定為Excel writer 引擎

import pandas as pd

# Create a Pandas dataframe from the data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')

# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')

# Close the Pandas Excel writer and output the Excel file.
writer.save()

其輸出如下所示:

_images / pandas_simple.png

請(qǐng)參閱示例:Pandas Excel示例中的完整示例。

從熊貓?jiān)L問(wèn)XlsxWriter

為了將圖表,條件格式設(shè)置和列格式設(shè)置等XlsxWriter功能應(yīng)用于Pandas輸出,我們需要訪問(wèn)基礎(chǔ) 工作簿工作表對(duì)象。之后,我們可以將它們視為普通的XlsxWriter對(duì)象。

從上面的示例繼續(xù),我們執(zhí)行以下操作:

import pandas as pd

# Create a Pandas dataframe from the data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')

# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')

# Get the xlsxwriter objects from the dataframe writer object.
workbook  = writer.book
worksheet = writer.sheets['Sheet1']

單獨(dú)使用XlsxWriter時(shí),這等效于以下代碼:

workbook  = xlsxwriter.Workbook('filename.xlsx')
worksheet = workbook.add_worksheet()

然后可以使用Workbook和Worksheet對(duì)象訪問(wèn)其他XlsxWriter功能,請(qǐng)參見(jiàn)下文。

將圖表添加到數(shù)據(jù)框輸出

如上一節(jié)所示,一旦有了Workbook和Worksheet對(duì)象,我們就可以使用它們來(lái)應(yīng)用其他功能,例如添加圖表:

# Get the xlsxwriter objects from the dataframe writer object.
workbook  = writer.book
worksheet = writer.sheets['Sheet1']

# Create a chart object.
chart = workbook.add_chart({'type': 'column'})

# Configure the series of the chart from the dataframe data.
chart.add_series({'values': '=Sheet1!$B$2:$B$8'})

# Insert the chart into the worksheet.
worksheet.insert_chart('D2', chart)

輸出如下所示:

_images / pandas_chart.png

請(qǐng)參閱示例:帶有圖表的Pandas Excel輸出中的完整示例。

向數(shù)據(jù)框輸出添加條件格式

另一種選擇是應(yīng)用這樣的條件格式:

# Apply a conditional format to the cell range.
worksheet.conditional_format('B2:B8', {'type': '3_color_scale'})

這將給:

_images / pandas_conditional.png

請(qǐng)參閱示例:條件格式的Pandas Excel輸出中的完整示例。

數(shù)據(jù)框輸出的格式

XlsxWriter and Pandas provide very little support for formatting the output data from a dataframe apart from default formatting such as the header and index cells and any cells that contain dates or datetimes. In addition it isn’t possible to format any cells that already have a default format applied.

If you require very controlled formatting of the dataframe output then you would probably be better off using Xlsxwriter directly with raw data taken from Pandas. However, some formatting options are available.

For example it is possible to set the default date and datetime formats via the Pandas interface:

writer = pd.ExcelWriter("pandas_datetime.xlsx",
                        engine='xlsxwriter',
                        datetime_format='mmm d yyyy hh:mm:ss',
                        date_format='mmmm dd yyyy')

Which would give:

_images / pandas_datetime.png

See the full example at Example: Pandas Excel output with datetimes.

It is possible to format any other, non date/datetime column data using set_column():

# Add some cell formats.
format1 = workbook.add_format({'num_format': '#,##0.00'})
format2 = workbook.add_format({'num_format': '0%'})

# Set the column width and format.
worksheet.set_column('B:B', 18, format1)

# Set the format but not the column width.
worksheet.set_column('C:C', None, format2)
_images / pandas_column_formats.png

Note: This feature requires Pandas >= 0.16.

See the full example at Example: Pandas Excel output with column formatting.

Formatting of the Dataframe headers

Pandas writes the dataframe header with a default cell format. Since it is a cell format it cannot be overridden using set_row(). If you wish to use your own format for the headings then the best approach is to turn off the automatic header from Pandas and write your own. For example:

# Turn off the default header and skip one row to allow us to insert a
# user defined header.
df.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False)

# Get the xlsxwriter workbook and worksheet objects.
workbook  = writer.book
worksheet = writer.sheets['Sheet1']

# Add a header format.
header_format = workbook.add_format({
    'bold': True,
    'text_wrap': True,
    'valign': 'top',
    'fg_color': '#D7E4BC',
    'border': 1})

# Write the column headers with the defined format.
for col_num, value in enumerate(df.columns.values):
    worksheet.write(0, col_num + 1, value, header_format)
_images / pandas_header_format.png

See the full example at Example: Pandas Excel output with user defined header format.

Handling multiple Pandas Dataframes

It is possible to write more than one dataframe to a worksheet or to several worksheets. For example to write multiple dataframes to multiple worksheets:

# Write each dataframe to a different worksheet.
df1.to_excel(writer, sheet_name='Sheet1')
df2.to_excel(writer, sheet_name='Sheet2')
df3.to_excel(writer, sheet_name='Sheet3')

See the full example at Example: Pandas Excel with multiple dataframes.

It is also possible to position multiple dataframes within the same worksheet:

# Position the dataframes in the worksheet.
df1.to_excel(writer, sheet_name='Sheet1')  # Default position, cell A1.
df2.to_excel(writer, sheet_name='Sheet1', startcol=3)
df3.to_excel(writer, sheet_name='Sheet1', startrow=6)

# Write the dataframe without the header and index.
df4.to_excel(writer, sheet_name='Sheet1',
             startrow=7, startcol=4, header=False, index=False)
_images / pandas_positioning.png

See the full example at Example: Pandas Excel dataframe positioning.

Passing XlsxWriter constructor options to Pandas

XlsxWriter supports several Workbook() constructor options such as strings_to_urls(). These can also be applied to the Workbook object created by Pandas as follows:

writer = pd.ExcelWriter('pandas_example.xlsx',
                        engine='xlsxwriter',
                        options={'strings_to_urls': False})

Saving the Dataframe output to a string

It is also possible to write the Pandas XlsxWriter DataFrame output to a byte array:

import pandas as pd
import io

# Create a Pandas dataframe from the data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})

output = io.BytesIO()

# Use the BytesIO object as the filehandle.
writer = pd.ExcelWriter(output, engine='xlsxwriter')

# Write the data frame to the BytesIO object.
df.to_excel(writer, sheet_name='Sheet1')

writer.save()
xlsx_data = output.getvalue()

# Do something with the data...

Note: This feature requires Pandas >= 0.17.

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容