Hive解析嵌套JSON数组
背景:
同时发生的埋点数据往往会在一个json字符串里发送,形式是[json,json,json]的埋点数组,需要把这些数据拉平文章来源地址https://www.toymoban.com/news/detail-523481.html
处理数据:
[
{
"responseData":[
Object{...},
Object{...},
Object{...},
Object{...},
Object{...},
Object{...},
Object{...},
Object{...},
Object{...},
Object{...}
],
"requestData":Object{...}
},
{
"responseData":[
Object{...},
Object{...},
Object{...},
Object{...},
Object{...},
Object{...},
Object{...},
Object{...},
Object{...},
Object{...}
],
"requestData":Object{...}
}
]
核心思想:
- 把最外层的"[“和”]"去除
- 把"},{“转换为”}|||{" ,使用split函数根据"|||"把string转为array,LATERAL view explode()把array转为列
- 第二步的时候发现,内部的json数组也有"},{",也会被处理一起处理
- 加上定制化的"responseData"使得替换具有唯一性,把},{"responseData"转换为}|||{“responseData”
代码实现
hive代码实现:
insert overwrite table ods.ods_shop_original_back_point_flat partition (pdate = '${pdate}')
select json_str
from ods.ods_shop_original_back_point
LATERAL view explode(split(REGEXP_REPLACE(regexp_extract(json , '^\\[(.+)\\]$'),'\\}\\,\\{\\"(responseData)\\"','\\}\\|\\|\\|\\{\\"responseData\\"'), '\\|\\|\\|')) str as json_str
where pdate = '${pdate}'
spark代码实现:
spark.sql(
s"""
|with aa as
|(select
| a.id,
| a.distinct_id,
| a.page_code,
| a.module_code,
| a.event_code,
| a.app_version,
| a.os,
| a.os_version,
| a.lib,
| a.lib_version,
| a.manufacturer,
| a.model,
| a.phone_type,
| a.device_id,
| a.product_firmware_version,
| a.seasoning_box_firmware_version,
| a.`time`,
| get_json_object(goods_list.goods_id,"$sc.product_id") goods_id,
| '' catalog,
| a.sn_code,
| a.pdate
|from dwd.dwd_event a
|lateral view explode(split(REGEXP_REPLACE(regexp_extract(get_json_object(get_json_object(a.param,"$sc.userBehavior"),"$sc.data.product_list") , '^\\\\[(.+)\\\\]$sc'),'\\\\}\\\\,\\\\{','\\\\}\\\\|\\\\|\\\\|\\\\{'), '\\\\|\\\\|\\\\|')) goods_list as goods_id
|where a.pdate >= '$dateString'
|and a.pdate <= '$enddateString'
|and a.event_code = 'prefab_food_score')
|
|select
| aa.id,
| d.id distinct_id,
| e.user_type,
| aa.page_code,
| aa.module_code,
| aa.event_code,
| aa.app_version,
| aa.os,
| aa.os_version,
| aa.lib,
| aa.lib_version,
| aa.manufacturer,
| aa.model,
| aa.phone_type,
| aa.device_id,
| aa.product_firmware_version,
| aa.seasoning_box_firmware_version,
| aa.`time`,
| aa.goods_id,
| b.menu_id,
| b.menu_name,
| aa.catalog,
| aa.sn_code,
| aa.pdate
|from aa
|join dim.dim_menu b
| on aa.pdate = b.pdate
| and b.goods_id is not null
| and b.menu_status = 1
| and b.bin_version = '3.0'
| and aa.goods_id = b.goods_id
|join dim.dim_prefab_food c
| on aa.pdate = c.pdate
| and aa.goods_id = c.goods_id
|join ods.ods_tl_gl_user_info d
| on aa.distinct_id = d.uc_uid
| and aa.pdate = d.pdate
|join dim.dim_user e
| on aa.pdate = e.pdate
| and d.id = e.user_id
|""".stripMargin).createTempView("food")
文章来源:https://www.toymoban.com/news/detail-523481.html
到了这里,关于Hive解析嵌套JSON数组的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!