axis理解
NumPy數(shù)組的維數(shù)稱為軸(axes),軸的個數(shù)叫秩(rank),一維數(shù)組的秩為1,二維數(shù)組的秩為2。
Stackoverflow系列(1) -Python Pandas與Numpy中axis參數(shù)的二義性
取數(shù)組元素
x = np.array([2, 4, 0, 3, 5])
# 不包括倒數(shù)第一個
x[:-1]
[2,4,0,3]
x=np.array([[1,2,3],[4,5,6],[7,8,9]])
# 二維數(shù)組,逗號前后表示要取的行和列,:就是全部取,0:2就是取第0列和第1列,不包括第2列
print(x[:,0:2])
[[1 2]
[4 5]
[7 8]]
如果只取一列,下面這種形式就會變成一個一位數(shù)組,要加上一個[],才可以維持原有的二維數(shù)組的形式。
print(x[:,-1])
[3 6 9]
print(x[:,[-1]])
[[3]
[6]
[9]]
排序
默認是升序排序。
list1 = [[1,3,2], [3,5,4]]
array = numpy.array(list1)
array = sort(array, axis=1) #對第1維升序排序
#array = sort(array, axis=0) #對第0維
print(array)
[[1 2 3]
[3 4 5]]
降序排序的實現(xiàn):
array = -sort(-array, axis=1) #降序
[[3 2 1]
[5 4 3]]
參考
運算、索引、切片
http://blog.csdn.net/liangzuojiayi/article/details/51534164
矩陣的各類乘法
dot product點積
x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]
### VECTORIZED DOT PRODUCT OF VECTORS ###
tic = time.process_time()
dot = np.dot(x1,x2)
toc = time.process_time()
print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")
只有兩個值是普通數(shù)組的時候才可以是點積,如果是np.array,則dot會變成矩陣乘法。也就是
x1 = np.array([[1,2,3]])
x2 = np.array([[1,2,3]])
np.dot(x1,x2)
會報錯
ValueError: shapes (1,3) and (1,3) not aligned: 3 (dim 1) != 1 (dim 0)
outer product外積
x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]
### VECTORIZED OUTER PRODUCT ###
outer = np.outer(x1,x2)
element-wise multipulation按位乘
mul = np.multiply(x1,x2)
general dot product矩陣乘法
W = np.random.rand(3,len(x1))
dot = np.dot(W,x1)
可以看出dot既可以用作點積,也可以執(zhí)行矩陣乘法
Broadcasting
廣播用以描述numpy中對兩個形狀不同的陣列進行數(shù)學計算的處理機制。較小的陣列“廣播”到較大陣列相同的形狀尺度上,使它們對等以可以進行數(shù)學計算。廣播提供了一種向量化陣列的操作方式,因此Python不需要像C一樣循環(huán)。廣播操作不需要數(shù)據(jù)復制,通常執(zhí)行效率非常高。然而,有時廣播是個壞主意,可能會導致內(nèi)存浪費以致計算減慢。
Numpy操作通常由成對的陣列完成,陣列間逐個元素對元素地執(zhí)行。最簡單的情形是兩個陣列有一樣的形狀,例如:
>>> a = np.array([1.0, 2.0, 3.0])
>>> b = np.array([2.0, 2.0, 2.0])
>>> a * b
array([ 2., 4., 6.])
Numpy的廣播機制放寬了對陣列形狀的限制。最簡單的情形是一個陣列和一個尺度值相乘:
>>> a = np.array([1.0, 2.0, 3.0])
>>> b = 2.0
>>> a * b
array([ 2., 4., 6.])
上面兩種結(jié)果是一樣的,我們可以認為尺度值b在計算時被延展得和a一樣的形狀。延展后的b的每一個元素都是原來尺度值的復制。延展的類比只是一種概念性的。實際上,Numpy并不需要真的復制這些尺度值,所以廣播運算在內(nèi)存和計算效率上盡量高效。
上面的第二個例子比第一個更高效,因為廣播在乘法計算時動用更少的內(nèi)存。
exp
broadcast運算
x = np.array([1,2,3])
np.exp(x)
sum
broadcast運算。
def softmax(x):
x_exp = np.exp(x)
x_sum = np.sum(x_exp, axis=1, keepdims=True)
s = x_exp/x_sum
matrix
array轉(zhuǎn)matrix
s = np.array([5,5,0,0,0,5])
np.matrix(s)
加載數(shù)據(jù)
loadtxt
numpy.loadtxt(fname, dtype=<type 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0)[source]
參數(shù)
fname : file, str, or pathlib.Path
File, filename, or generator to read. If the filename extension is
.gzor.bz2, the file is first decompressed. Note that generators should return byte strings for Python 3k.
dtype : data-type, optional
Data-type of the resulting array; default: float. If this is a structured data-type, the resulting array will be 1-dimensional, and each row will be interpreted as an element of the array. In this case, the number of columns used must match the number of fields in the data-type.
comments : str or sequence, optional
The characters or list of characters used to indicate the start of a comment; default: ‘#’.
delimiter : str, optional
The string used to separate values. By default, this is any whitespace.
converters : dict, optional
A dictionary mapping column number to a function that will convert that column to a float. E.g., if column 0 is a date string:
converters = {0: datestr2num}. Converters can also be used to provide a default value for missing data (but see alsogenfromtxt):converters = {3: lambda s: float(s.strip() or 0)}. Default: None.
skiprows : int, optional
Skip the first skiprows lines; default: 0.
usecols : int or sequence, optional
Which columns to read, with 0 being the first. For example, usecols = (1,4,5) will extract the 2nd, 5th and 6th columns. The default, None, results in all columns being read.
New in version 1.11.0.
Also when a single column has to be read it is possible to use an integer instead of a tuple. E.g
usecols = 3reads the fourth column the same way as usecols = (3,)` would.
unpack : bool, optional
If True, the returned array is transposed, so that arguments may be unpacked using
x, y, z = loadtxt(...). When used with a structured data-type, arrays are returned for each field. Default is False.
ndmin : int, optional
The returned array will have at least ndmin dimensions. Otherwise mono-dimensional axes will be squeezed. Legal values: 0 (default), 1 or 2.
New in version 1.6.0.
返回
out : ndarray
Data read from the text file.
genfromtxt
import numpy
nfl = numpy.genfromtxt("data.csv", delimiter=",")
# U75就是將每個值作為一個75 byte的unicode來讀取
world_alcohol = np.genfromtxt('world_alcohol.csv', dtype='U75', skip_header=1, delimiter=',')
data = np.genfromtxt('/Users/david/david/code/00project/carthage/scripts/adult.data', delimiter=', ', dtype=str)
# 取第14列
labels = data[:,14]
# 取除了倒數(shù)第二列之外的所有列
data = data[:,:-1]
matrix轉(zhuǎn)數(shù)組
np.argsort(y_score, kind="mergesort")[::-1]
隨機數(shù)字的矩陣
import numpy as np
numpy_matrix = np.random.randint(10, size=[5,2])
‘’‘
array([[1, 0],
[8, 4],
[0, 5],
[2, 9],
[9, 9]])
’‘’
獲取排序后數(shù)據(jù)位置的下標
import numpy as np
dd=np.mat([4,5,1])
dd1 = dd.argsort()
print dd
print dd1 #matrix([[2, 0, 1]], dtype=int64)
squeeze
從數(shù)組的形狀中刪除單維條目,即把shape中為1的維度去掉
x = np.array([[[0], [1], [2]]])
np.squeeze(x)
array([0, 1, 2])
如果本來就是(1,1)的矩陣,則變成常數(shù)
cost = np.array([[1]])
cost = np.squeeze(cost)
得到1,cost的shape變成()
獲取符合條件的行列集合
數(shù)據(jù)如
1,1,1,0,0,0
0,1,1,1,1,0
1,0,0,1,1,0
0,0,0,1,1,0
第一列作為y_train,后面矩陣作為x_train,需要獲取y_train中為1的x_train的行
pos_rows = (y_train == 1)
x_train[pos_rows,:]
還有個例子
vector = numpy.array([5, 10, 15, 20])
vector == 10
[False, True, False, False]
matrix = numpy.array([
[5, 10, 15],
[20, 25, 30],
[35, 40, 45]
])
matrix == 25
[
[False, False, False],
[False, True, False],
[False, False, False]
]
比如要找第二列中是25的那一行
matrix = np.array([
[5, 10, 15],
[20, 25, 30],
[35, 40, 45]
])
second_column_25 = (matrix[:,1] == 25)
# 等同于print(matrix[second_column_25])
print(matrix[second_column_25, :])
[
[20, 25, 30]
]
多個條件的比較
vector = numpy.array([5, 10, 15, 20])
equal_to_ten_and_five = (vector == 10) & (vector == 5)
[False, False, False, False]
vector = numpy.array([5, 10, 15, 20])
equal_to_ten_or_five = (vector == 10) | (vector == 5)
[True, True, False, False]
也可以根據(jù)比較的結(jié)果改變值
vector = numpy.array([5, 10, 15, 20])
equal_to_ten_or_five = (vector == 10) | (vector == 5)
vector[equal_to_ten_or_five] = 50
print(vector)
true的都變成了50
[50, 50, 15, 20]