Python Pandas是一個(gè)Python數(shù)據(jù)分析庫(kù)。它可以讀取,過(guò)濾和重新排列大小數(shù)據(jù)集,并以包括Excel在內(nèi)的多種格式輸出它們。
Pandas使用Xlwt模塊(用于xls文件)和Openpyxl或XlsxWriter模塊(用于xlsx文件)寫(xiě)入Excel 文件。
將XlsxWriter與Pandas一起使用
要將XlsxWriter與Pandas一起使用,請(qǐng)將其指定為Excel writer 引擎:
import pandas as pd # Create a Pandas dataframe from the data. df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]}) # Create a Pandas Excel writer using XlsxWriter as the engine. writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter') # Convert the dataframe to an XlsxWriter Excel object. df.to_excel(writer, sheet_name='Sheet1') # Close the Pandas Excel writer and output the Excel file. writer.save()
其輸出如下所示:
請(qǐng)參閱示例:Pandas Excel示例中的完整示例。
從熊貓?jiān)L問(wèn)XlsxWriter
為了將圖表,條件格式設(shè)置和列格式設(shè)置等XlsxWriter功能應(yīng)用于Pandas輸出,我們需要訪問(wèn)基礎(chǔ) 工作簿和工作表對(duì)象。之后,我們可以將它們視為普通的XlsxWriter對(duì)象。
從上面的示例繼續(xù),我們執(zhí)行以下操作:
import pandas as pd # Create a Pandas dataframe from the data. df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]}) # Create a Pandas Excel writer using XlsxWriter as the engine. writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter') # Convert the dataframe to an XlsxWriter Excel object. df.to_excel(writer, sheet_name='Sheet1') # Get the xlsxwriter objects from the dataframe writer object. workbook = writer.book worksheet = writer.sheets['Sheet1']
單獨(dú)使用XlsxWriter時(shí),這等效于以下代碼:
workbook = xlsxwriter.Workbook('filename.xlsx') worksheet = workbook.add_worksheet()
然后可以使用Workbook和Worksheet對(duì)象訪問(wèn)其他XlsxWriter功能,請(qǐng)參見(jiàn)下文。
將圖表添加到數(shù)據(jù)框輸出
如上一節(jié)所示,一旦有了Workbook和Worksheet對(duì)象,我們就可以使用它們來(lái)應(yīng)用其他功能,例如添加圖表:
# Get the xlsxwriter objects from the dataframe writer object. workbook = writer.book worksheet = writer.sheets['Sheet1'] # Create a chart object. chart = workbook.add_chart({'type': 'column'}) # Configure the series of the chart from the dataframe data. chart.add_series({'values': '=Sheet1!$B$2:$B$8'}) # Insert the chart into the worksheet. worksheet.insert_chart('D2', chart)
輸出如下所示:
請(qǐng)參閱示例:帶有圖表的Pandas Excel輸出中的完整示例。
向數(shù)據(jù)框輸出添加條件格式
另一種選擇是應(yīng)用這樣的條件格式:
# Apply a conditional format to the cell range. worksheet.conditional_format('B2:B8', {'type': '3_color_scale'})
這將給:
請(qǐng)參閱示例:條件格式的Pandas Excel輸出中的完整示例。
數(shù)據(jù)框輸出的格式
XlsxWriter and Pandas provide very little support for formatting the output data from a dataframe apart from default formatting such as the header and index cells and any cells that contain dates or datetimes. In addition it isn’t possible to format any cells that already have a default format applied.
If you require very controlled formatting of the dataframe output then you would probably be better off using Xlsxwriter directly with raw data taken from Pandas. However, some formatting options are available.
For example it is possible to set the default date and datetime formats via the Pandas interface:
writer = pd.ExcelWriter("pandas_datetime.xlsx", engine='xlsxwriter', datetime_format='mmm d yyyy hh:mm:ss', date_format='mmmm dd yyyy')
Which would give:
See the full example at Example: Pandas Excel output with datetimes.
It is possible to format any other, non date/datetime column data using
set_column():
# Add some cell formats. format1 = workbook.add_format({'num_format': '#,##0.00'}) format2 = workbook.add_format({'num_format': '0%'}) # Set the column width and format. worksheet.set_column('B:B', 18, format1) # Set the format but not the column width. worksheet.set_column('C:C', None, format2)
Note: This feature requires Pandas >= 0.16.
See the full example at Example: Pandas Excel output with column formatting.
Formatting of the Dataframe headers
Pandas writes the dataframe header with a default cell format. Since it is a
cell format it cannot be overridden using set_row(). If you wish to
use your own format for the headings then the best approach is to turn off the
automatic header from Pandas and write your own. For example:
# Turn off the default header and skip one row to allow us to insert a # user defined header. df.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False) # Get the xlsxwriter workbook and worksheet objects. workbook = writer.book worksheet = writer.sheets['Sheet1'] # Add a header format. header_format = workbook.add_format({ 'bold': True, 'text_wrap': True, 'valign': 'top', 'fg_color': '#D7E4BC', 'border': 1}) # Write the column headers with the defined format. for col_num, value in enumerate(df.columns.values): worksheet.write(0, col_num + 1, value, header_format)
See the full example at Example: Pandas Excel output with user defined header format.
Handling multiple Pandas Dataframes
It is possible to write more than one dataframe to a worksheet or to several worksheets. For example to write multiple dataframes to multiple worksheets:
# Write each dataframe to a different worksheet. df1.to_excel(writer, sheet_name='Sheet1') df2.to_excel(writer, sheet_name='Sheet2') df3.to_excel(writer, sheet_name='Sheet3')
See the full example at Example: Pandas Excel with multiple dataframes.
It is also possible to position multiple dataframes within the same worksheet:
# Position the dataframes in the worksheet. df1.to_excel(writer, sheet_name='Sheet1') # Default position, cell A1. df2.to_excel(writer, sheet_name='Sheet1', startcol=3) df3.to_excel(writer, sheet_name='Sheet1', startrow=6) # Write the dataframe without the header and index. df4.to_excel(writer, sheet_name='Sheet1', startrow=7, startcol=4, header=False, index=False)
See the full example at Example: Pandas Excel dataframe positioning.
Passing XlsxWriter constructor options to Pandas
XlsxWriter supports several Workbook() constructor options such as
strings_to_urls(). These can also be applied to the Workbook object
created by Pandas as follows:
writer = pd.ExcelWriter('pandas_example.xlsx', engine='xlsxwriter', options={'strings_to_urls': False})
Saving the Dataframe output to a string
It is also possible to write the Pandas XlsxWriter DataFrame output to a byte array:
import pandas as pd import io # Create a Pandas dataframe from the data. df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]}) output = io.BytesIO() # Use the BytesIO object as the filehandle. writer = pd.ExcelWriter(output, engine='xlsxwriter') # Write the data frame to the BytesIO object. df.to_excel(writer, sheet_name='Sheet1') writer.save() xlsx_data = output.getvalue() # Do something with the data...
Note: This feature requires Pandas >= 0.17.