python程序中memory leak排查&gc機制

agenda

  • memory leak排查
  • python gc機制
  • 后記

所有分析都是基于cpython

memory leak排查

通常內存泄漏比較難排查, 可以借助工具和開閉實驗. 主要問題是如何定位泄漏部分, 可能自己寫的代碼泄漏, 也可能是調用庫造成的泄漏, 可能是py程序中忽略的引用造成的, 也可能引用或者自己native so造成. 之前負責調查一個7x24的線上服務, 服務啟動后, 隨著請求內存不不規(guī)則增加, 從啟動時的1G多占用一天內能達到14G之多.
思路是:

  watch_result = watch py heap memory diff via pylmer
  if watch_result is continues increase:
    focus on py module
  else:
    focus on naive module

通過一段時間的觀察(pylmer模塊掃描這個py heap非常消耗性能, 在線上抽樣數(shù)據(jù)需要注意), py heap有漲有跌, 這里隨著業(yè)務場景復雜度有差異, 之前負責的業(yè)務在單位時間內會保存一些的圖片(numpy.array), 這里會帶來局部時間的不穩(wěn)定, 局部時間內確實內存增長迅猛, 所以需要根據(jù)業(yè)務場景觀察一個周期內的內存變化.
當時鎖定在native library: tensorflow, cv2, numpy這幾塊兒上, 由于發(fā)現(xiàn)越是高峰期泄漏越發(fā)明顯, 首要懷疑tensorflow的問題, 簡單寫了一個每個模型并發(fā)做FG實驗, 觀察內存沒有變化, 排除tensorflow. 后續(xù)繞了一大圈最后solid repro是tensorflow的memory leak, 開始的實驗沒有沒有嚴格按照業(yè)務場景來.當然是后話, 由于是第一次排查py memory leak, 對于pmlyer的結果并不是特別有信心, 感覺上tensorflow作為廣泛使用框架,不應該有這個問題, 犯了主觀主義的錯誤.
當時初步懷疑是py code的問題.
現(xiàn)在反過來想, 上面用py偽代碼的思路是ok, 通過單位周期的內存變化就可以大體定位py程序是否有泄漏, 比較糟糕的是py和native都有泄漏.

python gc機制

調查python內存問題就一定要了解python內存機制

reference counting

python中的垃圾回收基于引用計數(shù), 優(yōu)點是簡單高效, 缺點不能完全擺脫互引用(雖然有cycle reference detect), 引自官網:

The principle is simple: every object contains a counter, which is incremented when a reference to the object is stored somewhere, and which is decremented when a reference to it is deleted. When the counter reaches zero, the last reference to the object has been deleted and the object is freed.

在解釋器層面通過調用Py_INCREF(x)增加引用, Py_DECREF(x)降低引用,當引用數(shù)等于0后可以釋放(具體時機控制在gc里).

通常Py_INCREF的場景是:

  • 賦值
  • 傳參
  • 將變量放入list, dict, tuple中

通常Py_DECREF(x):

  • 變量作用域出離
  • 顯示調用del
    當程序有特定場景是某些情況造成某些不需要的object的ref count一直不等于0(沒有調用Py_DECREF)就造成了memory leak.

    獲得一個變量的引用計數(shù)可以通過sys.getrefcount(x)函數(shù), 所有引用變量可以通過gc.get_referrers(x), 實例如下:

sys.getrefcount函數(shù)獲取了一個引用, 所以變量a的引用是2.

通常python heap分三代,存活越久就越往后放, 每一代可以單獨設置閾值觸發(fā).每一代中記錄著當前分配object數(shù)目 - 上次gc數(shù)目, 當超過閾值設置就會觸發(fā)gc, python gc是嚴格stop the world.缺省的threshold是(700, 10, 10) , 代碼示例如下:

ownership rules

python中區(qū)分mutable object & immutable object(其實這一套背后東西和java是相同的),對于immutable object py gc后面一套收集機制, 具體可以參考Memory management in Python, 需要注意的是python的內存機制中會緩存類似內存池的機制, 有的內存是不釋放回os.

Unlike many other languages, Python does not necessarily release the memory back to the Operating System. Instead, it has a specialized object allocator for small objects (smaller or equal to 512 bytes), which keeps some chunks of already allocated memory for further use in future. The amount of memory that Python holds depending on the usage patterns, in some cases all allocated memory is never released.
Therefore, if a long-running Python process takes more memory over time, it does not necessarily mean that you have memory leaks

需要注意的是: 通過PyInt_FromLong/Py_BuildValue返回的是reference的owner ship, 需要區(qū)分對待返回的是一個transfer owner ship的ref還是一個borrow的ref( PyImport_AddModule() also returns a borrowed reference, even though it may actually create the object it returns: this is possible because an owned reference to the object is stored in sys.modules).
從python調用c時傳參數(shù)時borrow ref, 在函數(shù)返回前變量的生命周期可以保證, 當C程序需要留存對象就需要調用Py_INCREF以確保對象不被釋放.同樣C程序的返回也是以borrow ref的身份返回給了python段, python程序在適當?shù)臅r機釋放.
關于borrow ref引發(fā)的問題可以參見1.10.3 Thin Ice

cycle reference detect & hazard

ref counting先天的缺陷就是循環(huán)引用(這也就java gc使用引用鏈可達分析方式), 因為這一點python飽受詬病, 由于這塊很早就扎根于py內部, 至今還是沿襲, 嚴格來說python可以檢測cycle reference只不過只限于雙方都在同一代中, 如果不在同一代中就會造成memory leak.代碼實例如下:

import sys
import gc
gc.disable()
gc.set_threshold(0)
a = []
b = []
a.append(b)
b.append(a)
print(gc.get_count())

del a
del b
print ('gc collect = ' + str(gc.collect()))
print(gc.get_count())

運行輸出如下:

(541, 8, 0)
gc collect = 2
(1, 0, 0)

可以看到gc檢測到了,并釋放了.
python檢測到circular reference的實現(xiàn)在gcmodule.c中, 偽代碼流程如下和注釋如下:

/* This is the main function.  Read this to understand how the collection process works. */
static Py_ssize_t collect(int generation, Py_ssize_t *n_collected, Py_ssize_t *n_uncollectable, int nofail){
      // update_refs() copies the true refcount to gc_refs, for each object in the generation being collected.
      // decouple real ref count with gc_refs
      update_refs(young);
      //subtract_refs() then adjusts gc_refs so that it equals the number of times an object is referenced directly from outside the generation being collected.
      // only concern with outer ref, gc_ref != 0    ===>  exist some outer ref
      subtract_refs(young);
      //move gc_ref == 0 's items into a list of temp  unreachable, mark each item with GC_TENTATIVELY_UNREACHABLE
      gc_list_init(&unreachable);
      //start current young, go though left item that exist outer ref(gc_ref != 0), if there exist a link between unreachable item, bring it back and re-do for newly added one
      move_unreachable(young, &unreachable);
}

圖例解釋如下:


起始狀態(tài)

經過subtract_refs, 把內部引用去掉


初始化完成gc_ref

構建unreachable list并mark GC_TENTATIVELY_UNREACHABLE

上圖中已經操作完'link3'和'link4', 正在操作'link2', 可知'link2'也會進入unreachable list, 'link1'是reachable的
通過link1找回link2, link3

最終只有l(wèi)ink4 還留在unreachable list中,這就是真的達不到了, free掉它
由此看出, 假如外部存在孤立的circular reference, 還是會造成memory leak.
再有一點, 加入了circular reference gc collect的時間復雜度從O(N)變成了O(N^2), 造成了gc stop the world時長非線性增長.

performance concern & tips

從上面的分析可以看出, 因為circular reference的關系, py heap STW時間不是線性關系, python對full gc有額外保護, 僅當 last_none_full_gc_survived_obj_counts > 25%* last_full_gc_survived_obj_counts時, 才會進行full gc. gc實現(xiàn)中代碼注釋如下:

Using the above ratio, instead, yields amortized linear performance in the total number of objects (the effect of which can be summarized thusly: "each full garbage collection is more and more costly as the number of objects grows, but we do fewer and fewer of them").

因為在python中full gc是越來越慢的, 所以假如程序中沒有循環(huán)引用, 可以將gc disable, 根據(jù)自己的業(yè)務場景清除無用的內存垃圾.

前人總結 tips:

  • 避免使用finalizer
  • 假如不得已要使用finalizer, 需要提升python版本3.4以上
  • 針對場景是用weak reference
  • 針對業(yè)務場景disable gc, 手動清除

    finalizer是所有gc的噩夢, 因為經過了finalizer之后, 原來'死'的object有可能又活過來了, 造成原有的引用關系失效, 在py3.4之前, 假如在finalizer中改變引用關系, gc是感知不到, 會造成memory leak. 3.4之后, 那些起死回生的object被稱為'revived object', 在gc collect最后又做了一次檢查,代碼如下:

后記

經過學習py gc機制和排查定位問題不在py端, 但是gc得略態(tài)缺省是作為服務端語言的一個concern, 后面有時間整理一下tensorflow在GPU模式下的內存機制和泄漏問題

ref link:
https://docs.python.org/2.0/ext/refcounts.html
https://docs.python.org/2.0/ext/refcountsInPython.html
https://docs.python.org/2.0/ext/ownershipRules.html
https://docs.python.org/2.0/ext/thinIce.html
https://rushter.com/blog/python-garbage-collector/
https://rushter.com/blog/python-memory-managment/
https://pythoninternal.wordpress.com/2014/08/04/the-garbage-collector/
https://hg.python.org/cpython/file/eafe4007c999/Modules/gcmodule.c#l1023
https://docs.python.org/2.7/library/gc.html
https://www.quora.com/How-does-garbage-collection-in-Python-work-What-are-the-pros-and-cons

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯(lián)系作者
【社區(qū)內容提示】社區(qū)部分內容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發(fā)布,文章內容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關閱讀更多精彩內容

友情鏈接更多精彩內容