python因為有GIL鎖,因此多線程也只能使用一個處理器,但是numpy是例外:
http://scipy-cookbook.readthedocs.io/items/ParallelProgramming.html 這篇文字講了numpy的并行計算,我把自己的理解總結(jié)如下:
numpy本身的矩陣運算(array operations)可以繞過GIL
因為numpy內(nèi)部是用C寫的,不經(jīng)過python解釋器,因此它本身的矩陣運算(array operations)都可以使用多核,此外它內(nèi)部還用了BLAS(the Basic Linear Algebra Subroutines),因此可以進(jìn)一步優(yōu)化計算速度。
多線程(Threads),numpy的矩陣運算和IO一樣,都會釋放GIL
據(jù)我理解即使釋放解釋器,numpy因為不依賴解釋器,所以仍然在運行;而其他線程這個時候也可以使用解釋器,如果其他線程也有numpy的代碼,那么該numpy也可以同樣釋放解釋器。
while a thread is waiting** for IO **(for you to type something, say, or for something to come in the network) python releases the GIL so other threads can run. And, more importantly for us, while numpy is doing an array operation, python also releases the GIL. Thus if you tell one thread to do, (A和B都是numpy矩陣):
>>> A = B + C
>>> print A
During the print operations and the % formatting operation, no other thread can execute. But during the A = B + C, another thread can run - and if you've written your code in a numpy style, much of the calculation will be done in a few array operations like A = B + C. Thus you can actually get a speedup from using multiple threads.
多進(jìn)程(Processes)自然更加能解決并行問題
多進(jìn)程間numpy arrays也可共享,具體怎么共享再說
It is possible to share memory between processes, including numpy arrays
最后這個例子特別好:
Comparison
Here is a very basic comparison which illustrates the effect of the GIL (on a dual core machine).
import numpy as np
import math
def f(x):
print x
y = [1]*10000000
[math.exp(i) for i in y]
def g(x):
print x
y = np.ones(10000000)
np.exp(y)
from handythread import foreach
from processing import Pool
from timings import f,g
def fornorm(f,l):
for i in l:
f(i)
time fornorm(g,range(100))
time fornorm(f,range(10))
time foreach(g,range(100),threads=2)
time foreach(f,range(10),threads=2)
p = Pool(2)
time p.map(g,range(100))
time p.map(f,range(10))
| 100 * g() | 10 * f() | |
|---|---|---|
| normal | 43.5s |
48s |
| 2 threads | 31s |
71.5s |
| 2 processes | 27s |
31.23 |
For function f(), which does not release the GIL, threading actually performs worse than serial code, presumably due to the overhead of context switching. However, using 2 processes does provide a significant speedup. For function g() which uses numpy and releases the GIL, both threads and processes provide a significant speed up, although multiprocesses is slightly faster.
我自己用代碼仿照寫了一個例子,可以直接運行(python3.6):https://gist.github.com/miniyk2012/4a2edf98493d91c60af06232b6c69582
注:
這篇文章假設(shè)numpy本身無法利用多核, 因此需要python寫多線程來讓numpy在多核跑.
其實numpy本身也是可以利用多核的, 見這篇文章: https://roman-kh.github.io/numpy-multicore/