numpy如何并行計算

python因為有GIL鎖,因此多線程也只能使用一個處理器,但是numpy是例外:
http://scipy-cookbook.readthedocs.io/items/ParallelProgramming.html 這篇文字講了numpy的并行計算,我把自己的理解總結(jié)如下:

numpy本身的矩陣運算(array operations)可以繞過GIL

因為numpy內(nèi)部是用C寫的,不經(jīng)過python解釋器,因此它本身的矩陣運算(array operations)都可以使用多核,此外它內(nèi)部還用了BLAS(the Basic Linear Algebra Subroutines),因此可以進(jìn)一步優(yōu)化計算速度。

多線程(Threads),numpy的矩陣運算和IO一樣,都會釋放GIL

據(jù)我理解即使釋放解釋器,numpy因為不依賴解釋器,所以仍然在運行;而其他線程這個時候也可以使用解釋器,如果其他線程也有numpy的代碼,那么該numpy也可以同樣釋放解釋器。

while a thread is waiting** for IO **(for you to type something, say, or for something to come in the network) python releases the GIL so other threads can run. And, more importantly for us, while numpy is doing an array operation, python also releases the GIL. Thus if you tell one thread to do, (A和B都是numpy矩陣):

>>> A = B + C
>>> print A

During the print operations and the % formatting operation, no other thread can execute. But during the A = B + C, another thread can run - and if you've written your code in a numpy style, much of the calculation will be done in a few array operations like A = B + C. Thus you can actually get a speedup from using multiple threads.

多進(jìn)程(Processes)自然更加能解決并行問題

多進(jìn)程間numpy arrays也可共享,具體怎么共享再說

It is possible to share memory between processes, including numpy arrays

最后這個例子特別好:

Comparison

Here is a very basic comparison which illustrates the effect of the GIL (on a dual core machine).

import numpy as np
import math
def f(x):
    print x
    y = [1]*10000000
    [math.exp(i) for i in y]
def g(x):
    print x
    y = np.ones(10000000)
    np.exp(y)

from handythread import foreach
from processing import Pool
from timings import f,g
def fornorm(f,l):
    for i in l:
        f(i)
time fornorm(g,range(100))
time fornorm(f,range(10))
time foreach(g,range(100),threads=2)
time foreach(f,range(10),threads=2)
p = Pool(2)
time p.map(g,range(100))
time p.map(f,range(10))

100 * g() 10 * f()
normal 43.5s 48s
2 threads 31s 71.5s
2 processes 27s 31.23

For function f(), which does not release the GIL, threading actually performs worse than serial code, presumably due to the overhead of context switching. However, using 2 processes does provide a significant speedup. For function g() which uses numpy and releases the GIL, both threads and processes provide a significant speed up, although multiprocesses is slightly faster.

我自己用代碼仿照寫了一個例子,可以直接運行(python3.6):https://gist.github.com/miniyk2012/4a2edf98493d91c60af06232b6c69582

注:

這篇文章假設(shè)numpy本身無法利用多核, 因此需要python寫多線程來讓numpy在多核跑.
其實numpy本身也是可以利用多核的, 見這篇文章: https://roman-kh.github.io/numpy-multicore/

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

友情鏈接更多精彩內(nèi)容