DataFrame

DataFrame 表示矩陣數(shù)據(jù)表,有行索引和列索引。

構(gòu)建方式


In [43]: data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],
    ...:         'year' : [2000, 2001, 2002, 2001, 2001, 2003],
    ...:         'pop'  : [1.5, 1.7,  3.6, 2.4, 2.9, 3.2]}

In [44]: frame = pd.DataFrame(data)

In [45]: frame
Out[45]:
    state  year  pop
0    Ohio  2000  1.5
1    Ohio  2001  1.7
2    Ohio  2002  3.6
3  Nevada  2001  2.4
4  Nevada  2001  2.9
5  Nevada  2003  3.2

對(duì)于大型 DataFrame,head 方法只選出前5行

In [46]: frame.head()
Out[46]:
    state  year  pop
0    Ohio  2000  1.5
1    Ohio  2001  1.7
2    Ohio  2002  3.6
3  Nevada  2001  2.4
4  Nevada  2001  2.9

指定順序

In [47]: pd.DataFrame(data, columns=['year', 'state', 'pop'])
Out[47]:
   year   state  pop
0  2000    Ohio  1.5
1  2001    Ohio  1.7
2  2002    Ohio  3.6
3  2001  Nevada  2.4
4  2001  Nevada  2.9
5  2003  Nevada  3.2

傳的列不在字典中

In [49]: frame2 = pd.DataFrame(data, columns=['year', 'state', 'pop', 'debt'],
    ...:                                  index=['one', 'two', 'three', 'four', 'five', 'six'])

In [50]: frame2
Out[50]:
       year   state  pop debt
one    2000    Ohio  1.5  NaN
two    2001    Ohio  1.7  NaN
three  2002    Ohio  3.6  NaN
four   2001  Nevada  2.4  NaN
five   2001  Nevada  2.9  NaN
six    2003  Nevada  3.2  NaN

某一列可以按字典型標(biāo)記或?qū)傩詸z索為 Series

In [51]: frame2['state']
Out[51]:
one        Ohio
two        Ohio
three      Ohio
four     Nevada
five     Nevada
six      Nevada
Name: state, dtype: object

In [52]: frame2.year
Out[52]:
one      2000
two      2001
three    2002
four     2001
five     2001
six      2003
Name: year, dtype: int64

行也可以通過位置或特殊屬性 loc 進(jìn)行選取

In [53]: frame2.loc['three']
Out[53]:
year     2002
state    Ohio
pop       3.6
debt      NaN
Name: three, dtype: object

列的引用是可以修改的

In [54]: frame2['debt'] = 16.5

In [55]: frame2
Out[55]:
       year   state  pop  debt
one    2000    Ohio  1.5  16.5
two    2001    Ohio  1.7  16.5
three  2002    Ohio  3.6  16.5
four   2001  Nevada  2.4  16.5
five   2001  Nevada  2.9  16.5
six    2003  Nevada  3.2  16.5
In [56]: frame2['debt'] = np.arange(6.)

In [57]: frame2
Out[57]:
       year   state  pop  debt
one    2000    Ohio  1.5   0.0
two    2001    Ohio  1.7   1.0
three  2002    Ohio  3.6   2.0
four   2001  Nevada  2.4   3.0
five   2001  Nevada  2.9   4.0
six    2003  Nevada  3.2   5.0

將Series賦值給一列

In [58]: val = pd.Series([-1.2, -1.5, -1.7], index=['two', 'four', 'five'])

In [59]: frame2['debt'] = val

In [60]: frame2
Out[60]:
       year   state  pop  debt
one    2000    Ohio  1.5   NaN
two    2001    Ohio  1.7  -1.2
three  2002    Ohio  3.6   NaN
four   2001  Nevada  2.4  -1.5
five   2001  Nevada  2.9  -1.7
six    2003  Nevada  3.2   NaN

del 刪除某一列

In [61]: frame2['eastern'] = frame2.state == 'Ohio'

In [62]: frame2
Out[62]:
       year   state  pop  debt  eastern
one    2000    Ohio  1.5   NaN     True
two    2001    Ohio  1.7  -1.2     True
three  2002    Ohio  3.6   NaN     True
four   2001  Nevada  2.4  -1.5    False
five   2001  Nevada  2.9  -1.7    False
six    2003  Nevada  3.2   NaN    False

In [63]: del frame2['eastern']

In [64]: frame2.columns
Out[64]: Index(['year', 'state', 'pop', 'debt'], dtype='object')

對(duì)Series的修改會(huì)映射到DaraFrame中,如果要復(fù)制,應(yīng)顯示使用Series的copy方法

另一種數(shù)據(jù)形式

In [65]: pop = {'Nevada': {2001: 2.4, 2002: 2.9},
    ...:        'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}

In [66]: frame3 = pd.DataFrame(pop)

In [67]: frame3
Out[67]:
      Nevada  Ohio
2000     NaN   1.5
2001     2.4   1.7
2002     2.9   3.6

調(diào)換行和列

In [68]: frame3.T
Out[68]:
        2000  2001  2002
Nevada   NaN   2.4   2.9
Ohio     1.5   1.7   3.6

如果顯示指明索引,則內(nèi)部的字典的鍵不會(huì)被排序

In [69]: pd.DataFrame(pop, index=[2001, 2002, 2003])
Out[69]:
      Nevada  Ohio
2001     2.4   1.7
2002     2.9   3.6
2003     NaN   NaN

包含Series的字典也可以用于構(gòu)造DataFrame

In [70]: pdata = {'Ohio': frame3['Ohio'][: -1],
    ...:          'Nevada': frame3['Nevada'][: 2]}

In [71]: pd.DataFrame(pdata)
Out[71]:
      Ohio  Nevada
2000   1.5     NaN
2001   1.7     2.4

索引和列擁有name屬性

In [72]: frame3.index.name = 'year'

In [73]: frame3.columns.name = 'state'

In [74]: frame3
Out[74]:
state  Nevada  Ohio
year
2000      NaN   1.5
2001      2.4   1.7
2002      2.9   3.6
In [75]: frame3.values
Out[75]:
array([[nan, 1.5],
       [2.4, 1.7],
       [2.9, 3.6]])

自動(dòng)選擇適合所有列的類型

In [77]: frame2.values
Out[77]:
array([[2000, 'Ohio', 1.5, nan],
       [2001, 'Ohio', 1.7, -1.2],
       [2002, 'Ohio', 3.6, nan],
       [2001, 'Nevada', 2.4, -1.5],
       [2001, 'Nevada', 2.9, -1.7],
       [2003, 'Nevada', 3.2, nan]], dtype=object)

索引對(duì)象

在構(gòu)造Series或DataFrame時(shí),使用的任意數(shù)組或標(biāo)簽序列都可以在內(nèi)部轉(zhuǎn)換為索引對(duì)象

In [78]: obj = pd.Series(range(3), index=['a', 'b', 'c'])

In [79]: index = obj.index

In [80]: index
Out[80]: Index(['a', 'b', 'c'], dtype='object')

In [81]: index[1:]
Out[81]: Index(['b', 'c'], dtype='object')

In [82]: index[1] = 'd'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-82-a452e55ce13b> in <module>
----> 1 index[1] = 'd'

c:\users\a\appdata\local\programs\python\python36\lib\site-packages\pandas\core\indexes\base.py in __setitem__(self, key, value)
   3881
   3882     def __setitem__(self, key, value):
-> 3883         raise TypeError("Index does not support mutable operations")
   3884
   3885     def __getitem__(self, key):

TypeError: Index does not support mutable operations

In [83]:

In [83]: labels = pd.Index(np.arange(3))

In [84]: labels
Out[84]: Int64Index([0, 1, 2], dtype='int64')

In [85]: obj2 = pd.Series([1.5, -2.5, 0], index=labels)

In [86]: obj2
Out[86]:
0    1.5
1   -2.5
2    0.0
dtype: float64

In [87]: obj2.index is labels
Out[87]: True

索引對(duì)象是不可變的

In [89]: frame3.columns
Out[89]: Index(['Nevada', 'Ohio'], dtype='object', name='state')

In [90]: 'Ohio' in frame3.columns
Out[90]: True

In [91]: 2003 in frame3.columns
Out[91]: False

In [88]: frame3
Out[88]:
state  Nevada  Ohio
year
2000      NaN   1.5
2001      2.4   1.7
2002      2.9   3.6

In [89]: frame3.columns
Out[89]: Index(['Nevada', 'Ohio'], dtype='object', name='state')

In [90]: 'Ohio' in frame3.columns
Out[90]: True

In [91]: 2003 in frame3.columns
Out[91]: False

In [92]: dup_labels = pd.Index(['foo', 'foo', 'bar', 'bar'])

In [93]: dup_labels
Out[93]: Index(['foo', 'foo', 'bar', 'bar'], dtype='object')
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • 在python中,眾所周知,數(shù)據(jù)預(yù)處理最好用的包就是pandas了,以下是pandas里的dataframe數(shù)據(jù)結(jié)...
    天涯海角醉云游閱讀 31,621評(píng)論 1 12
  • DataFrame是一個(gè)2維標(biāo)簽的數(shù)據(jù)結(jié)構(gòu),可以把它簡(jiǎn)單的想成Excel表格或SQL Table,它的每一行或者每...
    躺在稻田里的小白菜閱讀 2,819評(píng)論 0 3
  • DataFrame這種列表式數(shù)據(jù)結(jié)構(gòu)跟我們常見的Excel極為相似。其設(shè)計(jì)初衷就是將Series的使用場(chǎng)景由一維擴(kuò)...
    安東尼卡閱讀 887評(píng)論 0 1
  • 置于時(shí)間之前的人,有種無畏,必是勇敢 滯于時(shí)間之后的人,有種無謂,亦是勇敢 勇敢……是必然吧……
    一個(gè)文字狗閱讀 346評(píng)論 0 0
  • 在家 最近幾天在家沒事看了一些高中初中時(shí)同學(xué)寫的的同學(xué)錄,看到好些高中初中好些同學(xué)對(duì)我的一些印象,再看看現(xiàn)在的我,...
    xiaotian666閱讀 255評(píng)論 0 1

友情鏈接更多精彩內(nèi)容