Python Json Decoder分析

JSON (JavaScript Object Notation) is a subset of
JavaScript syntax (ECMA-262 3rd edition) used as a lightweight data
interchange format.

JSON 是一種輕量級(jí)的數(shù)據(jù)交換格式。采用完全獨(dú)立于編程語(yǔ)言的文本格式來(lái)存儲(chǔ)和表示數(shù)據(jù)。簡(jiǎn)潔和清晰的層次結(jié)構(gòu)使得 JSON 成為理想的數(shù)據(jù)交換語(yǔ)言。 易于人閱讀和編寫(xiě),同時(shí)也易于機(jī)器解析和生成,并有效地提升網(wǎng)絡(luò)傳輸效率。

Python的json庫(kù)

Python自帶了json庫(kù),主要由Encoder、Decoder和Scanner三個(gè)部分組成。

最簡(jiǎn)單的例子

import json
s = {
    'a': 'a',
    'b': 'b'
}
print(json.dumps(s))
# {"a": "a", "b": "b"}
s = '{"a": "a", "b": "b"}'
print(json.loads(s))
# {'a': 'a', 'b': 'b'}

def loads

函數(shù)定義

def loads(s, *, encoding=None, cls=None, object_hook=None, parse_float=None,
        parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)
    ``object_hook`` is an optional function that will be called with the
    result of any object literal decode (a ``dict``). The return value of
    ``object_hook`` will be used instead of the ``dict``. This feature
    can be used to implement custom decoders (e.g. JSON-RPC class hinting).

    ``object_pairs_hook`` is an optional function that will be called with the
    result of any object literal decoded with an ordered list of pairs.  The
    return value of ``object_pairs_hook`` will be used instead of the ``dict``.
    This feature can be used to implement custom decoders that rely on the
    order that the key and value pairs are decoded (for example,
    collections.OrderedDict will remember the order of insertion). If
    ``object_hook`` is also defined, the ``object_pairs_hook`` takes priority.

    ``parse_float``, if specified, will be called with the string
    of every JSON float to be decoded. By default this is equivalent to
    float(num_str). This can be used to use another datatype or parser
    for JSON floats (e.g. decimal.Decimal).

    ``parse_int``, if specified, will be called with the string
    of every JSON int to be decoded. By default this is equivalent to
    int(num_str). This can be used to use another datatype or parser
    for JSON integers (e.g. float).

    ``parse_constant``, if specified, will be called with one of the
    following strings: -Infinity, Infinity, NaN.
    This can be used to raise an exception if invalid JSON numbers
    are encountered.

    To use a custom ``JSONDecoder`` subclass, specify it with the ``cls``
    kwarg; otherwise ``JSONDecoder`` is used.

    The ``encoding`` argument is ignored and deprecated.

object_hook和object_pairs_hook都可以自定義解碼器,但是object_hook返回的是解碼后的dict。object_pairs_hook返回的是有序的key-value元祖列表。當(dāng)兩個(gè)都給定時(shí),只調(diào)用object_pairs_hook。

舉個(gè)栗子

import json
j = '{"a": 1,"b": 2,"c": 3}'
json.loads(j, object_hook=lambda x: print(type(x), x))
# <class 'dict'> {'a': 1, 'b': 2, 'c': 3}
json.loads(j, object_pairs_hook=lambda x: print(type(x), x))
# <class 'list'> [('a', 1), ('b', 2), ('c', 3)]

parse_float、parse_int以及parse_constant可以針對(duì)float、int、NaN等值做轉(zhuǎn)化。

再舉個(gè)栗子

import json
j = '{"a": 1,"b": 2,"c": 3}'
json.loads(j, object_hook=lambda x: print(type(x), x), parse_int=str)
# <class 'dict'> {'a': '1', 'b': '2', 'c': '3'}

如果要使用自定義解碼器,可以創(chuàng)建一個(gè)JSONDecoder的子類(lèi),并通過(guò)cls參數(shù)調(diào)用它。
另外encoding已經(jīng)被廢棄了,使用它沒(méi)有任何用處。

JSONDecoder

loads函數(shù)會(huì)調(diào)用JSONDecoder(**kw).decode()進(jìn)行解析,JSONDecoder在構(gòu)造函數(shù)中定義了各種類(lèi)型變量解析函數(shù)以及掃描器。decode調(diào)用raw_decode從第一個(gè)不是空白字符的位置開(kāi)始進(jìn)行掃描。

def __init__(self, *, object_hook=None, parse_float=None,
        parse_int=None, parse_constant=None, strict=True,
        object_pairs_hook=None):
    self.object_hook = object_hook
    self.parse_float = parse_float or float
    self.parse_int = parse_int or int
    self.parse_constant = parse_constant or _CONSTANTS.__getitem__
    self.strict = strict
    self.object_pairs_hook = object_pairs_hook
    self.parse_object = JSONObject
    self.parse_array = JSONArray
    self.parse_string = scanstring
    self.memo = {}
    self.scan_once = scanner.make_scanner(self)

def decode(self, s, _w=WHITESPACE.match):
    """Return the Python representation of ``s`` (a ``str`` instance
    containing a JSON document).

    """
    obj, end = self.raw_decode(s, idx=_w(s, 0).end()) 
    # _w(s, start_idx).end() 指獲取從光標(biāo)為start_idx的位置開(kāi)始到下一個(gè)非空字符的光標(biāo)位置,也就是過(guò)濾空格
    # WHITESPACE定義了一個(gè)正則,這里不做了解,后文_w均為該意思
    end = _w(s, end).end()
    if end != len(s):
        raise JSONDecodeError("Extra data", s, end)
    return obj

def raw_decode(self, s, idx=0):
    """Decode a JSON document from ``s`` (a ``str`` beginning with
    a JSON document) and return a 2-tuple of the Python
    representation and the index in ``s`` where the document ended.

    This can be used to decode a JSON document from a string that may
    have extraneous data at the end.

    """
    try:
        obj, end = self.scan_once(s, idx)
    except StopIteration as err:
        raise JSONDecodeError("Expecting value", s, err.value) from None
    return obj, end

scanner

scanner的make_scanner會(huì)優(yōu)先使用CPython的scanner,我們這里只看python的scanner,它指向了py_make_scanner。
py_make_scanner接收調(diào)用它的對(duì)象作為context,raw_decode調(diào)用py_make_scanner的scan_once,scan_once又調(diào)用了_scan_once,這個(gè)函數(shù)是最終負(fù)責(zé)掃描的函數(shù)。

_scan_once首先根據(jù)idx獲取第一個(gè)字符,根據(jù)字符進(jìn)行判斷屬于哪種數(shù)據(jù)類(lèi)型,并將字符串分發(fā)給相應(yīng)的處理函數(shù)進(jìn)行解析,如果沒(méi)有命中任意一種類(lèi)型或已經(jīng)掃描完成,拋出停止迭代的異常。

def _scan_once(string, idx):
    try:
        nextchar = string[idx]
    except IndexError:
        raise StopIteration(idx)

    if nextchar == '"':
        return parse_string(string, idx + 1, strict)
    elif nextchar == '{':
        return parse_object((string, idx + 1), strict,
            _scan_once, object_hook, object_pairs_hook, memo)
    elif nextchar == '[':
        return parse_array((string, idx + 1), _scan_once)
    elif nextchar == 'n' and string[idx:idx + 4] == 'null':
        return None, idx + 4
    elif nextchar == 't' and string[idx:idx + 4] == 'true':
        return True, idx + 4
    elif nextchar == 'f' and string[idx:idx + 5] == 'false':
        return False, idx + 5

    m = match_number(string, idx)
    if m is not None:
        integer, frac, exp = m.groups()
        if frac or exp:
            res = parse_float(integer + (frac or '') + (exp or ''))
        else:
            res = parse_int(integer)
        return res, m.end()
    elif nextchar == 'N' and string[idx:idx + 3] == 'NaN':
        return parse_constant('NaN'), idx + 3
    elif nextchar == 'I' and string[idx:idx + 8] == 'Infinity':
        return parse_constant('Infinity'), idx + 8
    elif nextchar == '-' and string[idx:idx + 9] == '-Infinity':
        return parse_constant('-Infinity'), idx + 9
    else:
        raise StopIteration(idx)

parse_object

parse_object首先檢查字符是否為}或是"xxxx": 這種形式,如果不是則拋出異常,如果是"xxxx":形式,則將:后面的字符串再次執(zhí)行scan_once函數(shù)進(jìn)行迭代,并把返回的結(jié)果添加到pair的list中,再檢查下面的字符是不是 ,” 若是則循環(huán)執(zhí)行scan_once,若是 } 結(jié)束解析,并調(diào)用object_hook進(jìn)行自定義處理。

def JSONObject(s_and_end, strict, scan_once, object_hook, object_pairs_hook,
               memo=None, _w=WHITESPACE.match, _ws=WHITESPACE_STR):
    s, end = s_and_end
    pairs = []
    pairs_append = pairs.append
    # Backwards compatibility
    if memo is None:
        memo = {}
    memo_get = memo.setdefault
    # Use a slice to prevent IndexError from being raised, the following
    # check will raise a more specific ValueError if the string is empty
    nextchar = s[end:end + 1]
    # Normally we expect nextchar == '"'
    # 為了避免頻繁調(diào)用正則提高效率,用if過(guò)濾出小于2個(gè)空格的情況,超過(guò)才調(diào)用正則搜索空格結(jié)尾,后文均為這個(gè)作用
    if nextchar != '"':  
        if nextchar in _ws:
            end = _w(s, end).end()
            nextchar = s[end:end + 1]
        # Trivial empty object
        if nextchar == '}':
            if object_pairs_hook is not None:
                result = object_pairs_hook(pairs)
                return result, end + 1
            pairs = {}
            if object_hook is not None:
                pairs = object_hook(pairs)
            return pairs, end + 1
        elif nextchar != '"':
            raise JSONDecodeError(
                "Expecting property name enclosed in double quotes", s, end)
    end += 1
    while True:
        key, end = scanstring(s, end, strict)
        key = memo_get(key, key)
        # To skip some function call overhead we optimize the fast paths where
        # the JSON key separator is ": " or just ":".
        if s[end:end + 1] != ':':
            end = _w(s, end).end()
            if s[end:end + 1] != ':':
                raise JSONDecodeError("Expecting ':' delimiter", s, end)
        end += 1

        try:
            if s[end] in _ws:
                end += 1
                if s[end] in _ws:
                    end = _w(s, end + 1).end()
        except IndexError:
            pass

        try:
            value, end = scan_once(s, end)
        except StopIteration as err:
            raise JSONDecodeError("Expecting value", s, err.value) from None
        pairs_append((key, value))
        try:
            nextchar = s[end]
            if nextchar in _ws:
                end = _w(s, end + 1).end()
                nextchar = s[end]
        except IndexError:
            nextchar = ''
        end += 1

        if nextchar == '}':
            break
        elif nextchar != ',':
            raise JSONDecodeError("Expecting ',' delimiter", s, end - 1)
        end = _w(s, end).end()
        nextchar = s[end:end + 1]
        end += 1
        if nextchar != '"':
            raise JSONDecodeError(
                "Expecting property name enclosed in double quotes", s, end - 1)
    if object_pairs_hook is not None:
        result = object_pairs_hook(pairs)
        return result, end
    pairs = dict(pairs)
    if object_hook is not None:
        pairs = object_hook(pairs)
    return pairs, end

parse_array

parse_array首先檢查字符是否為],為]則返回[],否則將[后面的字符串再次執(zhí)行scan_once函數(shù)進(jìn)行迭代,并把返回的結(jié)果添加到list中,再檢查下面的字符若是,則循環(huán)執(zhí)行scan_once,若是]結(jié)束解析。

def JSONArray(s_and_end, scan_once, _w=WHITESPACE.match, _ws=WHITESPACE_STR):
    s, end = s_and_end
    values = []
    nextchar = s[end:end + 1]
    if nextchar in _ws:
        end = _w(s, end + 1).end()
        nextchar = s[end:end + 1]
    # Look-ahead for trivial empty array
    if nextchar == ']':
        return values, end + 1
    _append = values.append
    while True:
        try:
            value, end = scan_once(s, end)
        except StopIteration as err:
            raise JSONDecodeError("Expecting value", s, err.value) from None
        _append(value)
        nextchar = s[end:end + 1]
        if nextchar in _ws:
            end = _w(s, end + 1).end()
            nextchar = s[end:end + 1]
        end += 1
        if nextchar == ']':
            break
        elif nextchar != ',':
            raise JSONDecodeError("Expecting ',' delimiter", s, end - 1)
        try:
            if s[end] in _ws:
                end += 1
                if s[end] in _ws:
                    end = _w(s, end + 1).end()
        except IndexError:
            pass

    return values, end
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容