問題描述
有時候因為業(yè)務(wù)的需要,有些字段不但是json格式,并且還是個json數(shù)組,比如下表 pay_infos:
| pay_id | infos |
|---|---|
| 1111 | [{"uid":123,"terminalFrom":0,"couponBatchId":1410115799,"cost":5},{"uid":123,"terminalFrom":0,"couponBatchId":1410116199,,"cost":7,}] |
| 1112 | [{"uid":124,"terminalFrom":1,"couponBatchId":1410115799,"cost":20}] |
為了統(tǒng)計的需要,需要解析出每個json的具體key和value,比如上表,我需要求出,所有couponBatchId分類的cost的值。
解決辦法
- explode
對于infos字段,首先應(yīng)該解決數(shù)組的行轉(zhuǎn)列問題,這個交給explode函數(shù)
select
explode(split(substring(infos,3,length(coupon_enties)-4),'\\},\\{')) as entity
from pay_infos
通過explode一次處理infos按照單個json的字符串轉(zhuǎn)換成多行,如下
| entity |
|---|
| "uid":123,"terminalFrom":0,"couponBatchId":1410115799,"cost":5 |
| "uid":123,"terminalFrom":0,"couponBatchId":1410116199,,"cost":7 |
| "uid":124,"terminalFrom":1,"couponBatchId":1410115799,"cost":20 |
如果需要保留其他字段信息,則使用lateral view
select pay_id,concat('{',entity,'}') as coupons
from
( select *
from pay_infos
) a
lateral view explode(split(substring(infos,3,length(infos)-4),'\\},\\{')) b as entity
結(jié)果如下:
| pay_id | coupons |
|---|---|
| 1111 | "uid":123,"terminalFrom":0,"couponBatchId":1410115799,"cost":5 |
| 1111 | "uid":123,"terminalFrom":0,"couponBatchId":1410116199,,"cost":7 |
| 1112 | "uid":124,"terminalFrom":1,"couponBatchId":1410115799,"cost":20 |
- json_turpe
對于key:value格式的信息,可以通過拼成完成的json之后使用,json_turpe,也就是行轉(zhuǎn)列,比如上面輸出的表稱為source,栗子SQL如下:
select info.*
from
(
select concat('{',entity,'}') as coupons
from source
lateral view json_tuple(coupons,'couponBatchId','terminalFrom','cost','uid') info
as couponBatchId,terminalFrom,cost,uid
| couponBatchId | terminalFrom | cost | uid |
|---|---|---|---|
| 1410115799 | 0 | 5 | 123 |
| 1410116199 | 0 | 7 | 123 |
| 1410115799 | 1 | 20 | 124 |
結(jié)果如下:
| couponBatchId | terminalFrom | cost | uid |
|---|---|---|---|
| 1410115799 | 0 | 5 | 123 |
| 1410116199 | 0 | 7 | 123 |
| 1410115799 | 1 | 20 | 124 |
完整SQL
select info.*
from
(
(select
pay_id,concat('{',entity,'}') as coupons
from
(
select * from pay_infos
) a
lateral view explode(split(substring(infos,3,length(infos)-4),'\\},\\{')) b as entity) source
lateral view json_tuple(coupons,'couponBatchId','terminalFrom','cost','uid') info
as couponBatchId,terminalFrom,cost,uid
說明
- concat用于補出完整json
- split時,分隔符需要轉(zhuǎn)義,如果sql寫在" "內(nèi)(如:shell腳本里調(diào)用hive -e "$sql"的情形),則需要4個' \ '轉(zhuǎn)義特殊字符,即:
split(infos,'\\\\},\\\\{')