數(shù)據(jù)結(jié)構(gòu)
- dataframe
- panel
- series
dataframe
- 屬性
- shape (行數(shù),列數(shù))
- values numpy 二維數(shù)組array
- index 行索引
- columns 列索引
- T 轉(zhuǎn)置
- 方法
- head() 默認(rèn)前五行
- tail() 默認(rèn)后五行
panel: Dataframe的容器
未來有可能棄用,建議使用multiindex
series
- 屬性
- index
- values
索引
- 直接索引
df['column']['row']------先列后行 - 名字索引
df.loc['row', 'column']或df.loc['row', 'column'] - 數(shù)字索引
df.iloc[0, 1] - 組合索引
df.ix[:4, ['column1', 'column2']]
排序
- 內(nèi)容排序
df.sort_values(by=, ascending=) - 索引排序
df.sort_index()
運算
-
算數(shù)運算
- add +
- sub -
- mul *
- div /
- mod //
- pow **
-
邏輯運算
-
<>&| df[df['column']>2]df.query('column>2')df['column'].isin([1, 2])
-
-
統(tǒng)計運算
- sum
- mean
- mode
- median
- min
- max
- abs
- prod
- std
- var
- idxmax
- idxmin
自定義運算
apply(func, axis=0)
畫圖
dataframe.plot(x=None, y=None, kind='line')
kind:
- line
- bar
- barth
- hist
- pie
- scatter
文件讀取
| Format Type | Data Description | Reader | Writer |
|---|---|---|---|
| text | CSV | read_csv | to_csv |
| text | JSON | read_json | to_json |
| text | HTML | read_html | to_html |
| text | Local clipboard | read_clipboard | to_clipboard |
| binary | MS Excel | read_excel | to_excel |
| binary | OpenDocument | read_excel | |
| binary | HDF5 Format | read_hdf | to_hdf |
| binary | Feather Format | read_feather | to_feather |
| binary | Parquet Format | read_parquet | to_parquet |
| binary | Msgpack | read_msgpack | to_msgpack |
| binary | Stata | read_stata | to_stata |
| binary | SAS | read_sas | |
| binary | Python Pickle Format | read_pickle | to_pickle |
| SQL | SQL | read_sql | to_sql |
| SQL | Google Big Query | read_gbq | to_gbq |
read_csv(path, usecols=[], names=[])
- usecols: 只讀取固定列
- names: 為沒有列名的數(shù)據(jù),增加列名,否則文件第一行一般會當(dāng)作列名
to_csv(path, index=False, header=False, mode='a')
- index: 索引是否取消
- header: 表頭
- mode: 'a'代表追加