序列化 (Serialization)是將對象的狀態(tài)信息轉(zhuǎn)換為可以存儲或傳輸?shù)男问降倪^程。在序列化期間,對象將其當前狀態(tài)寫入到臨時或持久性存儲區(qū)。以后,可以通過從存儲區(qū)中讀取或反序列化對象的狀態(tài),重新創(chuàng)建該對象。
在scrapy_redis中,一個Request對象先經(jīng)過DupeFilter去重,然后遞交給scheduler調(diào)度儲存在Redis中,這就面臨一個問題,Request是一個對象,Redis不能存儲該對象,這時就需要將request序列化儲存。
scrapy中序列化模塊如下:
from scrapy_redis import picklecompat
"""A pickle wrapper module with protocol=-1 by default."""
try:
import cPickle as pickle # PY2
except ImportError:
import pickle
def loads(s):
return pickle.loads(s)
def dumps(obj):
return pickle.dumps(obj, protocol=-1)
當然python3直接使用pickle模塊, 已經(jīng)沒有cPickle,該模塊最為重要的兩個方法,序列化與反序列化如上,通過序列化后的對象我們可以存儲在數(shù)據(jù)庫、文本等文件中,并快速恢復。
同時模式設(shè)計中的備忘錄模式通過這種方式達到最佳效果《python設(shè)計模式(十九):備忘錄模式》;可序列化的對象和數(shù)據(jù)類型如下:
-
None,True,False - 整數(shù),長整數(shù),浮點數(shù),復數(shù)
- 普通字符串和Unicode字符串
- 元組、列表、集合和字典,只包含可選擇的對象。
- 在模塊頂層定義的函數(shù)
- 在模塊頂層定義的內(nèi)置函數(shù)
- 在模塊的頂層定義的類。
- 這些類的實例
嘗試對不可序列化對象進行操作,將引發(fā)PicklingError異常;發(fā)生這種情況時,可能已經(jīng)將未指定的字節(jié)數(shù)寫入基礎(chǔ)文件。嘗試選擇高度遞歸的數(shù)據(jù)結(jié)構(gòu)可能會超過最大遞歸深度,RuntimeError在這種情況下會被提起。
模塊API
pickle.dump(obj, file[, protocol])
- Write a pickled representation of obj to the open file object file. This is equivalent to
Pickler(file,``protocol).dump(obj).
If the protocol parameter is omitted, protocol 0 is used. If protocol is specified as a negative value orHIGHEST_PROTOCOL, the highest protocol version will be used.
*Changed in version 2.3: *Introduced the protocol parameter.
file must have awrite()method that accepts a single string argument. It can thus be a file object opened for writing, aStringIOobject, or any other custom object that meets this interface. -
pickle.load(file) - Read a string from the open file object file and interpret it as a pickle data stream, reconstructing and returning the original object hierarchy. This is equivalent to
Unpickler(file).load().
file must have two methods, aread()method that takes an integer argument, and areadline()method that requires no arguments. Both methods should return a string. Thus file can be a file object opened for reading, aStringIOobject, or any other custom object that meets this interface.
This function automatically determines whether the data stream was written in binary mode or not. -
pickle.dumps(obj[, protocol]) - Return the pickled representation of the object as a string, instead of writing it to a file.
If the protocol parameter is omitted, protocol 0 is used. If protocol is specified as a negative value orHIGHEST_PROTOCOL, the highest protocol version will be used.
*Changed in version 2.3: *The protocol parameter was added. -
pickle.loads(string) - Read a pickled object hierarchy from a string. Characters in the string past the pickled object’s representation are ignored.
至于應用場景,比較常見的有如下幾種:
程序重啟時恢復上次的狀態(tài)、會話存儲、對象的網(wǎng)絡傳輸。
