
agenda
- memory leak排查
- python gc機制
- 后記
所有分析都是基于cpython
memory leak排查
通常內存泄漏比較難排查, 可以借助工具和開閉實驗. 主要問題是如何定位泄漏部分, 可能自己寫的代碼泄漏, 也可能是調用庫造成的泄漏, 可能是py程序中忽略的引用造成的, 也可能引用或者自己native so造成. 之前負責調查一個7x24的線上服務, 服務啟動后, 隨著請求內存不不規(guī)則增加, 從啟動時的1G多占用一天內能達到14G之多.
思路是:
watch_result = watch py heap memory diff via pylmer
if watch_result is continues increase:
focus on py module
else:
focus on naive module
通過一段時間的觀察(pylmer模塊掃描這個py heap非常消耗性能, 在線上抽樣數(shù)據(jù)需要注意), py heap有漲有跌, 這里隨著業(yè)務場景復雜度有差異, 之前負責的業(yè)務在單位時間內會保存一些的圖片(numpy.array), 這里會帶來局部時間的不穩(wěn)定, 局部時間內確實內存增長迅猛, 所以需要根據(jù)業(yè)務場景觀察一個周期內的內存變化.
當時鎖定在native library: tensorflow, cv2, numpy這幾塊兒上, 由于發(fā)現(xiàn)越是高峰期泄漏越發(fā)明顯, 首要懷疑tensorflow的問題, 簡單寫了一個每個模型并發(fā)做FG實驗, 觀察內存沒有變化, 排除tensorflow. 后續(xù)繞了一大圈最后solid repro是tensorflow的memory leak, 開始的實驗沒有沒有嚴格按照業(yè)務場景來.當然是后話, 由于是第一次排查py memory leak, 對于pmlyer的結果并不是特別有信心, 感覺上tensorflow作為廣泛使用框架,不應該有這個問題, 犯了主觀主義的錯誤.
當時初步懷疑是py code的問題.
現(xiàn)在反過來想, 上面用py偽代碼的思路是ok, 通過單位周期的內存變化就可以大體定位py程序是否有泄漏, 比較糟糕的是py和native都有泄漏.
python gc機制
調查python內存問題就一定要了解python內存機制
reference counting
python中的垃圾回收基于引用計數(shù), 優(yōu)點是簡單高效, 缺點不能完全擺脫互引用(雖然有cycle reference detect), 引自官網:
The principle is simple: every object contains a counter, which is incremented when a reference to the object is stored somewhere, and which is decremented when a reference to it is deleted. When the counter reaches zero, the last reference to the object has been deleted and the object is freed.
在解釋器層面通過調用Py_INCREF(x)增加引用, Py_DECREF(x)降低引用,當引用數(shù)等于0后可以釋放(具體時機控制在gc里).
通常Py_INCREF的場景是:
- 賦值
- 傳參
- 將變量放入list, dict, tuple中
通常Py_DECREF(x):
- 變量作用域出離
-
顯示調用del
獲得一個變量的引用計數(shù)可以通過sys.getrefcount(x)函數(shù), 所有引用變量可以通過gc.get_referrers(x), 實例如下:
當程序有特定場景是某些情況造成某些不需要的object的ref count一直不等于0(沒有調用Py_DECREF)就造成了memory leak.
sys.getrefcount函數(shù)獲取了一個引用, 所以變量a的引用是2.

ownership rules
python中區(qū)分mutable object & immutable object(其實這一套背后東西和java是相同的),對于immutable object py gc后面一套收集機制, 具體可以參考Memory management in Python, 需要注意的是python的內存機制中會緩存類似內存池的機制, 有的內存是不釋放回os.
Unlike many other languages, Python does not necessarily release the memory back to the Operating System. Instead, it has a specialized object allocator for small objects (smaller or equal to 512 bytes), which keeps some chunks of already allocated memory for further use in future. The amount of memory that Python holds depending on the usage patterns, in some cases all allocated memory is never released.
Therefore, if a long-running Python process takes more memory over time, it does not necessarily mean that you have memory leaks
需要注意的是: 通過PyInt_FromLong/Py_BuildValue返回的是reference的owner ship, 需要區(qū)分對待返回的是一個transfer owner ship的ref還是一個borrow的ref( PyImport_AddModule() also returns a borrowed reference, even though it may actually create the object it returns: this is possible because an owned reference to the object is stored in sys.modules).
從python調用c時傳參數(shù)時borrow ref, 在函數(shù)返回前變量的生命周期可以保證, 當C程序需要留存對象就需要調用Py_INCREF以確保對象不被釋放.同樣C程序的返回也是以borrow ref的身份返回給了python段, python程序在適當?shù)臅r機釋放.
關于borrow ref引發(fā)的問題可以參見1.10.3 Thin Ice
cycle reference detect & hazard
ref counting先天的缺陷就是循環(huán)引用(這也就java gc使用引用鏈可達分析方式), 因為這一點python飽受詬病, 由于這塊很早就扎根于py內部, 至今還是沿襲, 嚴格來說python可以檢測cycle reference只不過只限于雙方都在同一代中, 如果不在同一代中就會造成memory leak.代碼實例如下:
import sys
import gc
gc.disable()
gc.set_threshold(0)
a = []
b = []
a.append(b)
b.append(a)
print(gc.get_count())
del a
del b
print ('gc collect = ' + str(gc.collect()))
print(gc.get_count())
運行輸出如下:
(541, 8, 0)
gc collect = 2
(1, 0, 0)
可以看到gc檢測到了,并釋放了.
python檢測到circular reference的實現(xiàn)在gcmodule.c中, 偽代碼流程如下和注釋如下:
/* This is the main function. Read this to understand how the collection process works. */
static Py_ssize_t collect(int generation, Py_ssize_t *n_collected, Py_ssize_t *n_uncollectable, int nofail){
// update_refs() copies the true refcount to gc_refs, for each object in the generation being collected.
// decouple real ref count with gc_refs
update_refs(young);
//subtract_refs() then adjusts gc_refs so that it equals the number of times an object is referenced directly from outside the generation being collected.
// only concern with outer ref, gc_ref != 0 ===> exist some outer ref
subtract_refs(young);
//move gc_ref == 0 's items into a list of temp unreachable, mark each item with GC_TENTATIVELY_UNREACHABLE
gc_list_init(&unreachable);
//start current young, go though left item that exist outer ref(gc_ref != 0), if there exist a link between unreachable item, bring it back and re-do for newly added one
move_unreachable(young, &unreachable);
}
圖例解釋如下:

經過subtract_refs, 把內部引用去掉


上圖中已經操作完'link3'和'link4', 正在操作'link2', 可知'link2'也會進入unreachable list, 'link1'是reachable的

最終只有l(wèi)ink4 還留在unreachable list中,這就是真的達不到了, free掉它
由此看出, 假如外部存在孤立的circular reference, 還是會造成memory leak.
再有一點, 加入了circular reference gc collect的時間復雜度從O(N)變成了O(N^2), 造成了gc stop the world時長非線性增長.
performance concern & tips
從上面的分析可以看出, 因為circular reference的關系, py heap STW時間不是線性關系, python對full gc有額外保護, 僅當 last_none_full_gc_survived_obj_counts > 25%* last_full_gc_survived_obj_counts時, 才會進行full gc. gc實現(xiàn)中代碼注釋如下:
Using the above ratio, instead, yields amortized linear performance in the total number of objects (the effect of which can be summarized thusly: "each full garbage collection is more and more costly as the number of objects grows, but we do fewer and fewer of them").
因為在python中full gc是越來越慢的, 所以假如程序中沒有循環(huán)引用, 可以將gc disable, 根據(jù)自己的業(yè)務場景清除無用的內存垃圾.
前人總結 tips:
- 避免使用finalizer
- 假如不得已要使用finalizer, 需要提升python版本3.4以上
- 針對場景是用weak reference
-
針對業(yè)務場景disable gc, 手動清除
finalizer是所有gc的噩夢, 因為經過了finalizer之后, 原來'死'的object有可能又活過來了, 造成原有的引用關系失效, 在py3.4之前, 假如在finalizer中改變引用關系, gc是感知不到, 會造成memory leak. 3.4之后, 那些起死回生的object被稱為'revived object', 在gc collect最后又做了一次檢查,代碼如下:
后記
經過學習py gc機制和排查定位問題不在py端, 但是gc得略態(tài)缺省是作為服務端語言的一個concern, 后面有時間整理一下tensorflow在GPU模式下的內存機制和泄漏問題
ref link:
https://docs.python.org/2.0/ext/refcounts.html
https://docs.python.org/2.0/ext/refcountsInPython.html
https://docs.python.org/2.0/ext/ownershipRules.html
https://docs.python.org/2.0/ext/thinIce.html
https://rushter.com/blog/python-garbage-collector/
https://rushter.com/blog/python-memory-managment/
https://pythoninternal.wordpress.com/2014/08/04/the-garbage-collector/
https://hg.python.org/cpython/file/eafe4007c999/Modules/gcmodule.c#l1023
https://docs.python.org/2.7/library/gc.html
https://www.quora.com/How-does-garbage-collection-in-Python-work-What-are-the-pros-and-cons

