一. explode单独使用。
1.1. 用于array类型的数据
- table_name 表名
- array_col 为数组类型的字段
- new_col array_col被explode之后对应的列
select explode(array_col) as new_col from table_name
1.2. 用于map类型的数据
由于map是kay-value结构的,所以它在转换的时候会转换成两列,一列是kay转换而成的,一列是value转换而成的。
- table_name 表名
- map_col 为map类型的字段
- may_key_col, may_value_col 分别对应map_col被拆分之后的map映射里的key 和 value
select explode(map_col) as (may_key_col, may_value_col) from table_name
二.explode结合lateral view使用
2.1 创建测试表,插入数据
CREATE table student_score(
stu_id string comment '学号',
stu_name string comment '姓名',
courses string comment '各个科目',
scores string comment '各个分数'
) comment '学生成绩表';
insert into student_score values
("1001", "张三","语文,数学,英语,历史,地理", "88,87,94,86,84"),
("1002", "李四", "语文,数学,英语,历史,地理", "78,89,75,79,68"),
("1003", "王五", "语文,数学,英语,历史,地理", "98,97,91,93,92"),
("1004", "朱六", "语文,数学,英语,历史,地理", "66,63,64,67,68");
2.2 测试explode 行转列
select a.stu_id, a.stu_name, table_view.course
from student_score a
lateral view explode(split(courses, ',')) table_view as `course`;
查询每个学生课程对应的分数,使用posexplode函数
先测试使用explode, 看看效果:
select a.stu_id, a.stu_name,
table_view1.course, table_view2.score
from student_score a
lateral view explode(split(courses, ',')) table_view1 as `course`
lateral view explode(split(scores, ',')) table_view2 as `score`;
出现这种情况是因为两个并列的explode的sql没办法识别每个科目对应的成绩是多少,对于多个数组的行转列可以使用posexplode函数。
例如使用如下查询语句:
select stu_id, stu_name, course, score
from student_score
lateral view posexplode(split(courses, ',')) table_view1 as a, course
lateral view posexplode(split(scores, ',')) table_view2 as b, score
where a = b;
三. 进阶使用
1、空格字符串函数:space
语法: space(int n)
返回值: string
说明:返回长度为n的空格字符串
举例:
hive> select space(10) from dual;
hive> select length(space(10)) from dual;
10
2、space函数与split函数结合,得到数组;
space函数与split函数结合,可以得到空格字符串数组
举例:
hive>select split(space(10), '');
[" "," "," "," "," "," "," "," "," "," ",""]
3、如何产生1-100的连续的数字?
结合space函数与split函数,posexplode函数,lateral view函数获得
实现方法一:
select
id_start+pos as id
from(
select
1 as id_start,
100 as id_end
) m lateral view posexplode(split(space(id_end-id_start), '')) t as pos, val
实现方法二:
select
row_number() over() as id
from
(select split(space(99), ' ') as x) t
lateral view
explode(x) ex;
备注:explode(x)和posexplode()均为炸裂函数,区别在于explode炸出一个值,posexplode不仅炸出一个值还附带索引号;
3. 如何产生开始日期到结束日期的连续的日期?
SELECT
DATE_ADD(START_DATE, pos)
FROM (
SELECT DISTINCT
"2023-03-13" AS START_DATE,
"2023-03-23" AS END_DATE
from order_detail
) s1 lateral VIEW posexplode(split(SPACE(DATEDIFF(END_DATE, START_DATE)), " ")) s2 AS pos, null_ele
四. explode和posexplode对比
数据表:
表名:default.class_info,分别是班级,姓名,成绩 3列
单列Explode:
需求:将student这一列中数据由一行变为多行(使用split和explode,并结合lateral view函数实现)
select
class,student_name
from
default.class_info
lateral view explode(split(student,',')) t as student_name
单列Posexplode
需求:想要给每班的每个同学按照顺序来一个编号(使用posexplode函数)
select
class,student_index + 1 as student_index,student_name
from
default.class_info
lateral view posexplode(split(student,',')) t as student_index,student_name;
注意:student_index+1 是因为index是从0开始的
多列Explode
需求:基于学生姓名和分数使其两两匹配,期望得到如下效果。
尝试: 先对两列进行explode
select
class,student_name,student_score
from
default.class_info
lateral view explode(split(student,',')) sn as student_name
lateral view explode(split(score,',')) sc as student_score
结果如下:
不太符合预期,如果对两列都进行explode的话,假设每列都有3个值,最终会变为3*3=9行,但我们实际只想要3行
解决办法:
我们进行两次posexplode,姓名和成绩都保留对应的序号,即便是变成了9行,通过where筛选只保留行号相同的index即可。
select
class,student_name,student_score
from
default.class_info
lateral view posexplode(split(student,',')) sn as student_index_sn,student_name
lateral view posexplode(split(score,',')) sc as student_index_sc,student_score
where
student_index_sn = student_index_sc;
需求:假设我们又想对同学的成绩进行一下排名(借助rank( ) 函数 )
select
class,
student_name,
student_score,
rank() over(partition by class order by student_score desc) as student_rank
from
default.class_info
lateral view posexplode(split(student,',')) sn as student_index_sn,student_name
lateral view posexplode(split(score,',')) sc as student_index_sc,student_score
where
student_index_sn = student_index_sc
order by class,student_rank;
补充:
若没有spilt函数,可能会用到array()
lateral view
posexplode (array()) as as f_keys,f_values
user_id bus_ bike_ taxi_ train_
1001 503 89 708 2054
1002 24 17 1008 500
1003 80 50 500 400文章来源:https://www.toymoban.com/news/detail-548698.html
user_id pay_type Amount
1001 1 503
1001 2 89
1001 3 708
1001 4 2054文章来源地址https://www.toymoban.com/news/detail-548698.html
select
user_id,
index_+1 as pay_type,
value as amount
from pay_record_log
lateral view
posexplode(ARRAY(bus_,bike_,taxi_,train_))t as index_,value
到了这里,关于大数据Hive篇:explode 和 posexplode的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!