JSON (JavaScript Object Notation) is a subset of
JavaScript syntax (ECMA-262 3rd edition) used as a lightweight data
interchange format.
JSON 是一種輕量級(jí)的數(shù)據(jù)交換格式。采用完全獨(dú)立于編程語(yǔ)言的文本格式來(lái)存儲(chǔ)和表示數(shù)據(jù)。簡(jiǎn)潔和清晰的層次結(jié)構(gòu)使得 JSON 成為理想的數(shù)據(jù)交換語(yǔ)言。 易于人閱讀和編寫(xiě),同時(shí)也易于機(jī)器解析和生成,并有效地提升網(wǎng)絡(luò)傳輸效率。
Python的json庫(kù)
Python自帶了json庫(kù),主要由Encoder、Decoder和Scanner三個(gè)部分組成。
最簡(jiǎn)單的例子
import json
s = {
'a': 'a',
'b': 'b'
}
print(json.dumps(s))
# {"a": "a", "b": "b"}
s = '{"a": "a", "b": "b"}'
print(json.loads(s))
# {'a': 'a', 'b': 'b'}
def loads
函數(shù)定義
def loads(s, *, encoding=None, cls=None, object_hook=None, parse_float=None,
parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)
``object_hook`` is an optional function that will be called with the
result of any object literal decode (a ``dict``). The return value of
``object_hook`` will be used instead of the ``dict``. This feature
can be used to implement custom decoders (e.g. JSON-RPC class hinting).
``object_pairs_hook`` is an optional function that will be called with the
result of any object literal decoded with an ordered list of pairs. The
return value of ``object_pairs_hook`` will be used instead of the ``dict``.
This feature can be used to implement custom decoders that rely on the
order that the key and value pairs are decoded (for example,
collections.OrderedDict will remember the order of insertion). If
``object_hook`` is also defined, the ``object_pairs_hook`` takes priority.
``parse_float``, if specified, will be called with the string
of every JSON float to be decoded. By default this is equivalent to
float(num_str). This can be used to use another datatype or parser
for JSON floats (e.g. decimal.Decimal).
``parse_int``, if specified, will be called with the string
of every JSON int to be decoded. By default this is equivalent to
int(num_str). This can be used to use another datatype or parser
for JSON integers (e.g. float).
``parse_constant``, if specified, will be called with one of the
following strings: -Infinity, Infinity, NaN.
This can be used to raise an exception if invalid JSON numbers
are encountered.
To use a custom ``JSONDecoder`` subclass, specify it with the ``cls``
kwarg; otherwise ``JSONDecoder`` is used.
The ``encoding`` argument is ignored and deprecated.
object_hook和object_pairs_hook都可以自定義解碼器,但是object_hook返回的是解碼后的dict。object_pairs_hook返回的是有序的key-value元祖列表。當(dāng)兩個(gè)都給定時(shí),只調(diào)用object_pairs_hook。
舉個(gè)栗子
import json
j = '{"a": 1,"b": 2,"c": 3}'
json.loads(j, object_hook=lambda x: print(type(x), x))
# <class 'dict'> {'a': 1, 'b': 2, 'c': 3}
json.loads(j, object_pairs_hook=lambda x: print(type(x), x))
# <class 'list'> [('a', 1), ('b', 2), ('c', 3)]
parse_float、parse_int以及parse_constant可以針對(duì)float、int、NaN等值做轉(zhuǎn)化。
再舉個(gè)栗子
import json
j = '{"a": 1,"b": 2,"c": 3}'
json.loads(j, object_hook=lambda x: print(type(x), x), parse_int=str)
# <class 'dict'> {'a': '1', 'b': '2', 'c': '3'}
如果要使用自定義解碼器,可以創(chuàng)建一個(gè)JSONDecoder的子類(lèi),并通過(guò)cls參數(shù)調(diào)用它。
另外encoding已經(jīng)被廢棄了,使用它沒(méi)有任何用處。
JSONDecoder
loads函數(shù)會(huì)調(diào)用JSONDecoder(**kw).decode()進(jìn)行解析,JSONDecoder在構(gòu)造函數(shù)中定義了各種類(lèi)型變量解析函數(shù)以及掃描器。decode調(diào)用raw_decode從第一個(gè)不是空白字符的位置開(kāi)始進(jìn)行掃描。
def __init__(self, *, object_hook=None, parse_float=None,
parse_int=None, parse_constant=None, strict=True,
object_pairs_hook=None):
self.object_hook = object_hook
self.parse_float = parse_float or float
self.parse_int = parse_int or int
self.parse_constant = parse_constant or _CONSTANTS.__getitem__
self.strict = strict
self.object_pairs_hook = object_pairs_hook
self.parse_object = JSONObject
self.parse_array = JSONArray
self.parse_string = scanstring
self.memo = {}
self.scan_once = scanner.make_scanner(self)
def decode(self, s, _w=WHITESPACE.match):
"""Return the Python representation of ``s`` (a ``str`` instance
containing a JSON document).
"""
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
# _w(s, start_idx).end() 指獲取從光標(biāo)為start_idx的位置開(kāi)始到下一個(gè)非空字符的光標(biāo)位置,也就是過(guò)濾空格
# WHITESPACE定義了一個(gè)正則,這里不做了解,后文_w均為該意思
end = _w(s, end).end()
if end != len(s):
raise JSONDecodeError("Extra data", s, end)
return obj
def raw_decode(self, s, idx=0):
"""Decode a JSON document from ``s`` (a ``str`` beginning with
a JSON document) and return a 2-tuple of the Python
representation and the index in ``s`` where the document ended.
This can be used to decode a JSON document from a string that may
have extraneous data at the end.
"""
try:
obj, end = self.scan_once(s, idx)
except StopIteration as err:
raise JSONDecodeError("Expecting value", s, err.value) from None
return obj, end
scanner
scanner的make_scanner會(huì)優(yōu)先使用CPython的scanner,我們這里只看python的scanner,它指向了py_make_scanner。
py_make_scanner接收調(diào)用它的對(duì)象作為context,raw_decode調(diào)用py_make_scanner的scan_once,scan_once又調(diào)用了_scan_once,這個(gè)函數(shù)是最終負(fù)責(zé)掃描的函數(shù)。
_scan_once首先根據(jù)idx獲取第一個(gè)字符,根據(jù)字符進(jìn)行判斷屬于哪種數(shù)據(jù)類(lèi)型,并將字符串分發(fā)給相應(yīng)的處理函數(shù)進(jìn)行解析,如果沒(méi)有命中任意一種類(lèi)型或已經(jīng)掃描完成,拋出停止迭代的異常。
def _scan_once(string, idx):
try:
nextchar = string[idx]
except IndexError:
raise StopIteration(idx)
if nextchar == '"':
return parse_string(string, idx + 1, strict)
elif nextchar == '{':
return parse_object((string, idx + 1), strict,
_scan_once, object_hook, object_pairs_hook, memo)
elif nextchar == '[':
return parse_array((string, idx + 1), _scan_once)
elif nextchar == 'n' and string[idx:idx + 4] == 'null':
return None, idx + 4
elif nextchar == 't' and string[idx:idx + 4] == 'true':
return True, idx + 4
elif nextchar == 'f' and string[idx:idx + 5] == 'false':
return False, idx + 5
m = match_number(string, idx)
if m is not None:
integer, frac, exp = m.groups()
if frac or exp:
res = parse_float(integer + (frac or '') + (exp or ''))
else:
res = parse_int(integer)
return res, m.end()
elif nextchar == 'N' and string[idx:idx + 3] == 'NaN':
return parse_constant('NaN'), idx + 3
elif nextchar == 'I' and string[idx:idx + 8] == 'Infinity':
return parse_constant('Infinity'), idx + 8
elif nextchar == '-' and string[idx:idx + 9] == '-Infinity':
return parse_constant('-Infinity'), idx + 9
else:
raise StopIteration(idx)
parse_object
parse_object首先檢查字符是否為}或是"xxxx": 這種形式,如果不是則拋出異常,如果是"xxxx":形式,則將:后面的字符串再次執(zhí)行scan_once函數(shù)進(jìn)行迭代,并把返回的結(jié)果添加到pair的list中,再檢查下面的字符是不是 ,” 若是則循環(huán)執(zhí)行scan_once,若是 } 結(jié)束解析,并調(diào)用object_hook進(jìn)行自定義處理。
def JSONObject(s_and_end, strict, scan_once, object_hook, object_pairs_hook,
memo=None, _w=WHITESPACE.match, _ws=WHITESPACE_STR):
s, end = s_and_end
pairs = []
pairs_append = pairs.append
# Backwards compatibility
if memo is None:
memo = {}
memo_get = memo.setdefault
# Use a slice to prevent IndexError from being raised, the following
# check will raise a more specific ValueError if the string is empty
nextchar = s[end:end + 1]
# Normally we expect nextchar == '"'
# 為了避免頻繁調(diào)用正則提高效率,用if過(guò)濾出小于2個(gè)空格的情況,超過(guò)才調(diào)用正則搜索空格結(jié)尾,后文均為這個(gè)作用
if nextchar != '"':
if nextchar in _ws:
end = _w(s, end).end()
nextchar = s[end:end + 1]
# Trivial empty object
if nextchar == '}':
if object_pairs_hook is not None:
result = object_pairs_hook(pairs)
return result, end + 1
pairs = {}
if object_hook is not None:
pairs = object_hook(pairs)
return pairs, end + 1
elif nextchar != '"':
raise JSONDecodeError(
"Expecting property name enclosed in double quotes", s, end)
end += 1
while True:
key, end = scanstring(s, end, strict)
key = memo_get(key, key)
# To skip some function call overhead we optimize the fast paths where
# the JSON key separator is ": " or just ":".
if s[end:end + 1] != ':':
end = _w(s, end).end()
if s[end:end + 1] != ':':
raise JSONDecodeError("Expecting ':' delimiter", s, end)
end += 1
try:
if s[end] in _ws:
end += 1
if s[end] in _ws:
end = _w(s, end + 1).end()
except IndexError:
pass
try:
value, end = scan_once(s, end)
except StopIteration as err:
raise JSONDecodeError("Expecting value", s, err.value) from None
pairs_append((key, value))
try:
nextchar = s[end]
if nextchar in _ws:
end = _w(s, end + 1).end()
nextchar = s[end]
except IndexError:
nextchar = ''
end += 1
if nextchar == '}':
break
elif nextchar != ',':
raise JSONDecodeError("Expecting ',' delimiter", s, end - 1)
end = _w(s, end).end()
nextchar = s[end:end + 1]
end += 1
if nextchar != '"':
raise JSONDecodeError(
"Expecting property name enclosed in double quotes", s, end - 1)
if object_pairs_hook is not None:
result = object_pairs_hook(pairs)
return result, end
pairs = dict(pairs)
if object_hook is not None:
pairs = object_hook(pairs)
return pairs, end
parse_array
parse_array首先檢查字符是否為],為]則返回[],否則將[后面的字符串再次執(zhí)行scan_once函數(shù)進(jìn)行迭代,并把返回的結(jié)果添加到list中,再檢查下面的字符若是,則循環(huán)執(zhí)行scan_once,若是]結(jié)束解析。
def JSONArray(s_and_end, scan_once, _w=WHITESPACE.match, _ws=WHITESPACE_STR):
s, end = s_and_end
values = []
nextchar = s[end:end + 1]
if nextchar in _ws:
end = _w(s, end + 1).end()
nextchar = s[end:end + 1]
# Look-ahead for trivial empty array
if nextchar == ']':
return values, end + 1
_append = values.append
while True:
try:
value, end = scan_once(s, end)
except StopIteration as err:
raise JSONDecodeError("Expecting value", s, err.value) from None
_append(value)
nextchar = s[end:end + 1]
if nextchar in _ws:
end = _w(s, end + 1).end()
nextchar = s[end:end + 1]
end += 1
if nextchar == ']':
break
elif nextchar != ',':
raise JSONDecodeError("Expecting ',' delimiter", s, end - 1)
try:
if s[end] in _ws:
end += 1
if s[end] in _ws:
end = _w(s, end + 1).end()
except IndexError:
pass
return values, end