數(shù)據(jù)處理中,一遇到j(luò)son就頭大,很長一段時間里,明知lateral view函數(shù)是個好東西,但就是很抗拒去學(xué),都是找數(shù)倉的同事先理好字段直接用,順便以菜雞的身份,同情和膜拜一下埋頭洗臟數(shù)的數(shù)倉同學(xué),大佬辛苦。。
前段時間加入到一個數(shù)據(jù)建設(shè)的項(xiàng)目中作為先鋒軍打頭陣,沒辦法遇到j(luò)son還是硬著頭皮終于學(xué)會了lateral view用法,感受:困難只是心中的一座大山!也不過如此!
我肯定我過不了幾天一定會忘掉(其實(shí)已經(jīng)忘掉一點(diǎn)了。。),沒有好記性拿起爛筆頭,記錄在這里吧。
基本語法:
select
*
from T t
lateral view json_tuple(t.json_txt,[],[],……) q as item1,item2,……
假設(shè)T表中有個json_txt字段取值格式如下:
{
"student_no":"0001",
"student_name":'zhangxiaoxiao',
"class":"高三(1)班",
"score_detail":{
"scoreList":[{"scores":[
{"course":"語文","score":100,"rank":2}
,{"course":"數(shù)學(xué)","score":120,"rank":9}
,{"course":"英語","score":110,"rank":6}
,{"course":"化學(xué)","score":90,"rank":4}
,{"course":"物理","score":90,"rank":3}
,{"course":"生物","score":90,"rank":2}
]
}]
},
"total_score":"600"
"overal_rank":"3",
}
如果我要得到每個學(xué)生的所有信息字段,則需要將json中的信息解析出來。
select
t.* ----表中其他原始字段保留
,q.student_no
,q.student_name,
,q.class,
,q.total_score,
,q.overal_rank,
,q.course,q.score,q.rank
from T t
lateral view json_tuple(t.json_txt,
"student_no",
"student_name",
"class",
"total_score",
"overal_rank",
"score_detail.scoreList.[*].scores.[*].course",
"score_detail.scoreList.[*].scores.[*].score"
"score_detail.scoreList.[*].scores.[*].rank"
) q as course,score,rank
得到結(jié)果如下:

result
但是score和rank是以數(shù)列形式存儲在同一行,不方便計(jì)算,用trans_array()函數(shù)可以解決啦:
select
trans_array(5,',',student_no,student_name,class,total_score,overal_rank,course,score,rank) as (student_no,student_name,class,total_score,overal_rank,course,score,rank)
from (
select
student_no,student_name,class,total_score,overal_rank
,regexp_replace(course,'(\\[)|(\\])|("))','') as course ---去掉[]"符號
,regexp_replace(score,'(\\[)|(\\])|("))','') as score ---去掉[]"符號
,regexp_replace(rank,'(\\[)|(\\])|("))','') as rank ---去掉[]"符號
from result
) t
得到的結(jié)果就是縱列的分?jǐn)?shù)明細(xì):

result2