利用Python進(jìn)行數(shù)據(jù)分析之Numpy學(xué)習(xí)筆記(二)

NumPy

這一篇文章主要來講索引,但是先不直接說各種索引的用法,先胡扯會(huì)需要知道的,也不是什么重點(diǎn),但是就是需要知道。沒有先后順序,就是胡扯。

以下所有的英文引用均來自官方介紹Indexing。

ndarrays can be indexed using the standard Python x[obj] syntax, where x is the array and obj the selection. There are three kinds of indexing available: field access, basic slicing, advanced indexing. Which one occurs depends on obj.

格式 :x[obj] ,其中x是array,obj是選擇項(xiàng),一共有三種索引方式: field access, basic slicing, advanced indexing,這是官方文檔的解釋,和我們平時(shí)說的有些出入。

In Python, x[(exp1, exp2, ..., expN)] is equivalent to x[exp1, exp2, ..., expN]; the latter is just syntactic sugar for the former.

在Python中, x[(exp1, exp2, ..., expN)] 等效于x[exp1, exp2, ..., expN],另外在《數(shù)據(jù)分析》一書中說”x[1][2]是等效于x[1,2]的?!?/p>

All arrays generated by basic slicing are always views of the original array.

通過切片產(chǎn)生的數(shù)組是原始數(shù)組的視圖。

Basic slicing with more than one non-: entry in the slicing tuple, acts like repeated application of slicing using a single non-: entry, where the non-: entries are successively taken (with all other non-: entries replaced by :). Thus, x[ind1,...,ind2,:] acts like x[ind1][...,ind2,:] under basic slicing.

Warning:

The above is not true for advanced indexing.

在切片元組中使用多個(gè)非:的基本切片,其行為類似于使用單個(gè)非:重復(fù)應(yīng)用于切片,其中非:是被連續(xù)采用的,并且必須是在前面出現(xiàn)的,經(jīng)測(cè)試:出現(xiàn)在前面失敗。,x[ind1,...,ind2,:] 等效于 x[ind1][...,ind2,:]

You may use slicing to set values in the array, but (unlike lists) you can never grow the array. The size of the value to be set in x[obj] = value must be (broadcastable) to the same shape as x[obj].

通切片索引賦值,value的shape要和x[obj]的形狀一致,如果一定要不同的話,那必須是可廣播的,并且賦值后的shape依舊不能變化。

Advanced indexing is triggered when the selection object, obj, is a non-tuple sequence object, an ndarray (of data type integer or bool), or a tuple with at least one sequence object or ndarray (of data type integer or bool). There are two types of advanced indexing: integer and Boolean.

Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view).

這一個(gè)是高級(jí)索引的定義,也就是我們說的花式索引。高級(jí)索引的觸發(fā)條件是x[obj]中的obj是一個(gè)非元組的序列對(duì)象,或者是一個(gè)數(shù)據(jù)類型是整型或布爾型的ndarray,或者是至少有一個(gè)序列對(duì)象或數(shù)據(jù)類型是整型或布爾型的ndarray的元組。

這里的翻譯確實(shí)繞口,如有翻譯錯(cuò)誤,請(qǐng)不吝指正。最前面的那個(gè)非元組應(yīng)該就是不能是純數(shù)字的元組E.g.(2,3,4),因?yàn)樵M也是一個(gè)序列對(duì)象x[(2,3,4)]就等于x[2,3,4],這就成了基本索引。

Integer array indexing

Integer array indexing allows selection of arbitrary items in the array based on their N-dimensional index. Each integer array represents a number of indexes into that dimension.

整數(shù)數(shù)組索引允許基于軸隨意的選擇元素,每一個(gè)整數(shù)數(shù)組代表了一些在特定維度上的索引。

Combining advanced and basic indexing

When there is at least one slice (:), ellipsis (...) or np.newaxis in the index (or the array has more dimensions than there are advanced indexes), then the behaviour can be more complicated. It is like concatenating the indexing result for each advanced index element.

當(dāng)高級(jí)索引里面包含基本索引的時(shí)候如切片,那么他就像高級(jí)索引里的每一個(gè)基本索引的串聯(lián),就是在上一個(gè)索引的基礎(chǔ)上索引,遞歸索引。說句實(shí)話括號(hào)里的那一句確實(shí)不知道在說什么。

下面就有一個(gè)例子,確實(shí)復(fù)雜,完全靠猜。

The easiest way to understand the situation may be to think in terms of the result shape. There are two parts to the indexing operation, the subspace defined by the basic indexing (excluding integers) and the subspace from the advanced indexing part. Two cases of index combination need to be distinguished:

  • The advanced indexes are separated by a slice, ellipsis or newaxis. For example x[arr1, :, arr2].
  • The advanced indexes are all next to each other. For example x[..., arr1, arr2, :] but not x[arr1, :, 1] since 1 is an advanced index in this regard.

In the first case, the dimensions resulting from the advanced indexing operation come first in the result array, and the subspace dimensions after that. In the second case, the dimensions from the advanced indexing operations are inserted into the result array at the same spot as they were in the initial array (the latter logic is what makes simple advanced indexing behave just like slicing).

Example

Suppose x.shape is (10,20,30) and ind is a (2,3,4)-shaped indexing intp array, then result = x[...,ind,:] has shape (10,2,3,4,30) because the (20,)-shaped subspace has been replaced with a (2,3,4)-shaped broadcasted indexing subspace. If we let i, j, k loop over the (2,3,4)-shaped subspace then result[...,i,j,k,:] = x[...,ind[i,j,k],:]. This example produces the same result as x.take(ind, axis=-2).

這個(gè)例子雖然說看不太懂吧,但是解釋了我以前遇到的奇葩問題:一個(gè)3×3的數(shù)組經(jīng)過一個(gè)2×2的數(shù)組索引后變成了一個(gè)2×2×3的數(shù)組,并且如果用一個(gè)自己構(gòu)造的同種結(jié)構(gòu)的列表數(shù)組,卻是無法實(shí)現(xiàn)的,結(jié)果和兩個(gè)數(shù)組一樣。

In [62]: array
Out[62]:
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [63]: aa
Out[63]:
array([[1, 0],
       [0, 1]])

In [64]: array[aa]
Out[64]:
array([[[4, 5, 6],
        [1, 2, 3]],

       [[1, 2, 3],
        [4, 5, 6]]])

In [65]: array[[1,0],[0,1]]
Out[65]: array([4, 2])

In [66]: array[[[1,0],[0,1]]]
Out[66]: array([4, 2])

這個(gè)一定要有個(gè)解釋的話,應(yīng)該是這樣的:ndarray中的每一個(gè)維度中同維度元素都是指向要索引數(shù)組同一緯度的,不想列表數(shù)組那樣,第一個(gè)數(shù)組指向0軸,第二個(gè)指向1軸,不是索引遞歸,而是同等級(jí)的,他們選出的數(shù)組也是同等級(jí)的。這并不能看成是簡(jiǎn)單的3替換成2×2。

Boolean array indexing

This advanced indexing occurs when obj is an array object of Boolean type, such as may be returned from comparison operators.

布爾型索引發(fā)生的條件是obj是一個(gè)布爾型數(shù)組,比如可以從比較運(yùn)算符返回。

其實(shí)這個(gè)布爾型索引和整數(shù)列表的高級(jí)索引是相似的。

好了不再瞎扯了,挺累的,現(xiàn)在開始規(guī)矩的說各種索引了,全部通過例子呈現(xiàn),凡是我想到的需要注意的,都寫在例子中了。

  1. 基本索引

    In [4]: arr
    Out[4]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
    
    In [5]: arr[5]
    Out[5]: 5
    
    In [7]: arr2
    Out[7]:
    array([[0, 1, 2],
           [3, 4, 5],
           [6, 7, 8]])
    
    In [8]: arr2[1][2]
    Out[8]: 5
    
    In [9]: arr2[1,2]
    Out[9]: 5
    

    x[a][b] == x[a,b]

    通過索引列表遞歸索引,維度遞歸,a索引的是最高維0軸元素,b索引的是次高維1軸元素。

  2. 切片索引

    In [10]: arr
    Out[10]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
    
    In [11]: arr[2:5]
    Out[11]: array([2, 3, 4])
    
    In [12]: arr[2::2]
    Out[12]: array([2, 4, 6, 8])
    
    In [13]: arr2
    Out[13]:
    array([[0, 1, 2],
           [3, 4, 5],
           [6, 7, 8]])
    
    In [14]: arr2[1:]
    Out[14]:
    array([[3, 4, 5],
           [6, 7, 8]])
    
    In [15]: arr2[1:,1:]
    Out[15]:
    array([[4, 5],
           [7, 8]])
    
    In [16]: arr2[:,:1]
    Out[16]:
    array([[0],
           [3],
           [6]])
    
    In [55]: arr2
    Out[55]:
    array([[0, 1, 2],
           [3, 4, 5],
           [6, 7, 8]])
    
    In [56]: temp = arr2[2:]
    
    In [57]: temp
    Out[57]: array([[6, 7, 8]])
    
    In [58]: temp = 9
    
    In [59]: temp
    Out[59]: 9
    
    In [60]: arr2
    Out[60]:
    array([[0, 1, 2],
           [3, 4, 5],
           [6, 7, 8]])        # 這里不會(huì)發(fā)生改變,temp已經(jīng)指向了新的區(qū)域
    
    In [61]: temp = arr2[2:]
    
    In [62]: temp
    Out[62]: array([[6, 7, 8]])
    
    In [63]: temp[:] = 9
    
    In [64]: temp
    Out[64]: array([[9, 9, 9]])
    
    In [65]: arr2
    Out[65]:
    array([[0, 1, 2],
           [3, 4, 5],
           [9, 9, 9]])        # 通過切片,改變temp,arr2數(shù)據(jù)也同時(shí)發(fā)生了改變。
    

    切片是在某一軸向進(jìn)行橫向選取,維度選定,這種的選取是同級(jí)的元素,這種選擇方式似乎還會(huì)保留原數(shù)據(jù)的相對(duì)維度信息。比如切片選擇一個(gè)3×3的數(shù)組的第一列,選出來的是(3,1)的數(shù)組,而基本索引選出來的是(3,)的。

    切片索引和列表索引可以疊在一起使用。

    通過第一個(gè)基本索引和這切片索引產(chǎn)生的數(shù)組是原數(shù)組的視圖,改變視圖即改變?cè)瓟?shù)據(jù)。

  3. 高級(jí)索引

    1. 整數(shù)數(shù)組索引

      In [10]: arr
      Out[10]:
      array([[ 0,  1,  2,  3],
             [ 4,  5,  6,  7],
             [ 8,  9, 10, 11],
             [12, 13, 14, 15],
             [16, 17, 18, 19],
             [20, 21, 22, 23]])
      
      In [11]: arr[[5,2,1,0]]
      Out[11]:
      array([[20, 21, 22, 23],
             [ 8,  9, 10, 11],
             [ 4,  5,  6,  7],
             [ 0,  1,  2,  3]])
      
      In [12]: arr[[5,2,1,0],[2,2,1,2]]        #構(gòu)成索引對(duì)
      Out[12]: array([22, 10,  5,  2])
      
      In [13]: arr[[[0,0],[5,5]],[[0,3],[0,3]]]        #選取四角元素方式一
      Out[13]:
      array([[ 0,  3],
             [20, 23]])
      
      In [14]: arr[[[0],[5]],[0,3]]        #選取四角元素方式二
      Out[14]:
      array([[ 0,  3],
             [20, 23]])
      

      整數(shù)數(shù)組索引是通過數(shù)組與數(shù)組一一對(duì)應(yīng)構(gòu)成索引對(duì)來選取的,每一數(shù)組代表不同軸,假如兩個(gè)數(shù)組形狀不同,如果這兩個(gè)數(shù)組能夠以廣播的形式構(gòu)成索引對(duì),也是可以的。

      如果一定要選取一個(gè)區(qū)域的話可以使用高級(jí)索引+切片索引,或者使用np.ix_函數(shù),此函數(shù)只允許傳入兩個(gè)一維整數(shù)數(shù)組。其實(shí)np.ix_產(chǎn)生的就是一個(gè)元組里面是兩個(gè)array,看一下array的形狀就知道np.ix_的原理了。

    2. 布爾型數(shù)組索引

      In [22]: arr
      Out[22]:
      array([[ 1.12105851,  0.27287448,  0.07762638, -0.26287726],
             [ 0.78763995, -0.48796014,  0.3238146 ,  0.22576988],
             [ 0.86004933,  1.79189963, -0.88055021, -0.1065679 ]])
      
      In [23]: arr[np.array([False,True,False])]
      Out[23]: array([[ 0.78763995, -0.48796014,  0.3238146 ,  0.22576988]])
      
      In [24]: arr[arr < 0]
      Out[24]: array([-0.26287726, -0.48796014, -0.88055021, -0.1065679 ])
      
      In [25]: arr[arr < 0] = 0        #通過布爾型數(shù)組設(shè)值
      
      In [26]: arr
      Out[26]:
      array([[ 1.12105851,  0.27287448,  0.07762638,  0.        ],
             [ 0.78763995,  0.        ,  0.3238146 ,  0.22576988],
             [ 0.86004933,  1.79189963,  0.        ,  0.        ]])
      

      通過布爾型數(shù)組選取數(shù)組中的數(shù)據(jù),總是創(chuàng)建數(shù)據(jù)的副本,因?yàn)椴紶栃蛿?shù)組索引也是高級(jí)索引的一種。

    3. ndarray索引

      ndarray做索引在上文已經(jīng)說明這里不再所贅述。


    還是那句話,如有不當(dāng)之處,理解錯(cuò)誤之處,歡迎指正。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容