scrapy_redis中序列化源碼及其在程序設(shè)計中的應用

序列化 (Serialization)是將對象的狀態(tài)信息轉(zhuǎn)換為可以存儲或傳輸?shù)男问降倪^程。在序列化期間,對象將其當前狀態(tài)寫入到臨時或持久性存儲區(qū)。以后,可以通過從存儲區(qū)中讀取或反序列化對象的狀態(tài),重新創(chuàng)建該對象。

在scrapy_redis中,一個Request對象先經(jīng)過DupeFilter去重,然后遞交給scheduler調(diào)度儲存在Redis中,這就面臨一個問題,Request是一個對象,Redis不能存儲該對象,這時就需要將request序列化儲存。

scrapy中序列化模塊如下:

from scrapy_redis import picklecompat

"""A pickle wrapper module with protocol=-1 by default."""

try:
    import cPickle as pickle  # PY2
except ImportError:
    import pickle

def loads(s):
    return pickle.loads(s)

def dumps(obj):
    return pickle.dumps(obj, protocol=-1)

當然python3直接使用pickle模塊, 已經(jīng)沒有cPickle,該模塊最為重要的兩個方法,序列化與反序列化如上,通過序列化后的對象我們可以存儲在數(shù)據(jù)庫、文本等文件中,并快速恢復。

同時模式設(shè)計中的備忘錄模式通過這種方式達到最佳效果《python設(shè)計模式(十九):備忘錄模式》;可序列化的對象和數(shù)據(jù)類型如下:

  • None, True,False
  • 整數(shù),長整數(shù),浮點數(shù),復數(shù)
  • 普通字符串和Unicode字符串
  • 元組、列表、集合和字典,只包含可選擇的對象。
  • 在模塊頂層定義的函數(shù)
  • 在模塊頂層定義的內(nèi)置函數(shù)
  • 在模塊的頂層定義的類。
  • 這些類的實例

嘗試對不可序列化對象進行操作,將引發(fā)PicklingError異常;發(fā)生這種情況時,可能已經(jīng)將未指定的字節(jié)數(shù)寫入基礎(chǔ)文件。嘗試選擇高度遞歸的數(shù)據(jù)結(jié)構(gòu)可能會超過最大遞歸深度,RuntimeError在這種情況下會被提起。

模塊API

pickle.dump(obj, file[, protocol])

  • Write a pickled representation of obj to the open file object file. This is equivalent to Pickler(file,``protocol).dump(obj).
    If the protocol parameter is omitted, protocol 0 is used. If protocol is specified as a negative value or HIGHEST_PROTOCOL, the highest protocol version will be used.
    *Changed in version 2.3: *Introduced the protocol parameter.
    file must have a write() method that accepts a single string argument. It can thus be a file object opened for writing, a StringIO object, or any other custom object that meets this interface.
  • pickle.load(file)
  • Read a string from the open file object file and interpret it as a pickle data stream, reconstructing and returning the original object hierarchy. This is equivalent to Unpickler(file).load().
    file must have two methods, a read() method that takes an integer argument, and a readline() method that requires no arguments. Both methods should return a string. Thus file can be a file object opened for reading, a StringIO object, or any other custom object that meets this interface.
    This function automatically determines whether the data stream was written in binary mode or not.
  • pickle.dumps(obj[, protocol])
  • Return the pickled representation of the object as a string, instead of writing it to a file.
    If the protocol parameter is omitted, protocol 0 is used. If protocol is specified as a negative value or HIGHEST_PROTOCOL, the highest protocol version will be used.
    *Changed in version 2.3: *The protocol parameter was added.
  • pickle.loads(string)
  • Read a pickled object hierarchy from a string. Characters in the string past the pickled object’s representation are ignored.

至于應用場景,比較常見的有如下幾種:

程序重啟時恢復上次的狀態(tài)、會話存儲、對象的網(wǎng)絡傳輸。

image
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容