1 需求
给定 mid,dt 的用户登录记录表,查找最近 7 天内连续 3 天活跃的用户 id
2 数据表
tmp_table.tmp_login_test
文章来源:https://www.toymoban.com/news/detail-519750.html
CREATE TABLE tmp_table.tmp_login_test (
mid string,
dt string
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
insert into tmp_table.tmp_login_test values
("zhangsan","2021-08-01"),
("zhangsan","2021-08-02"),
("zhangsan","2021-08-04"),
("zhangsan","2021-08-05"),
("zhangsan","2021-08-06"),
("zhangsan","2021-08-08"),
("zhangsan","2021-08-09"),
("zhangsan","2021-08-10"),
("lisi","2021-08-01"),
("lisi","2021-08-02"),
("lisi","2021-08-04"),
("lisi","2021-08-05"),
("lisi","2021-08-08"),
("lisi","2021-08-10"),
("wangwu","2021-08-01"),
("wangwu","2021-08-04"),
("wangwu","2021-08-07"),
("zhaoliu","2021-08-01"),
("zhaoliu","2021-08-02"),
("zhaoliu","2021-08-03"),
("zhaoliu","2021-08-04"),
("zhaoliu","2021-08-05"),
("zhaoliu","2021-08-06"),
("zhaoliu","2021-08-07"),
("zhaoliu","2021-08-08"),
("zhaoliu","2021-08-09"),
("zhaoliu","2021-08-10");
3 Sql 实现
实现思路:获取最近 7 天的用户登录记录数据(在 where 中限定),对数据进行 rank 排序,计算登录日期与 rank 值之间的差值(使用 date_sub 函数)得到一个差值日期,如果登录日期是连续的那么计算得到的差值日期是同一个,在此基础上基于用户,差值日期分组,统计 dt 的去重数量,即可得到每个用户每次连续登录的天数。在本例中,需要统计 7 天内连续 3 天登录的用户,所以只需要取出连续登录天数大于等于 3 的 uid 即完成需求。文章来源地址https://www.toymoban.com/news/detail-519750.html
- 对用户的登录行为按 mid 分组,组内按登录日期进行排序
select
mid, dt,
rank() over(partition by mid order by dt) rank_mid_dt
from tmp_table.tmp_login_test
where dt >= date_sub('2021-08-10', 6) and dt <= '2021-08-10';
+-----------+-------------+--------------+
| mid | dt | rank_mid_dt |
+-----------+-------------+--------------+
| lisi | 2021-08-04 | 1 |
| lisi | 2021-08-05 | 2 |
| lisi | 2021-08-08 | 3 |
| lisi | 2021-08-10 | 4 |
| zhangsan | 2021-08-04 | 1 |
| zhangsan | 2021-08-05 | 2 |
| zhangsan | 2021-08-06 | 3 |
| zhangsan | 2021-08-08 | 4 |
| zhangsan | 2021-08-09 | 5 |
| zhangsan | 2021-08-10 | 6 |
| zhaoliu | 2021-08-04 | 1 |
| zhaoliu | 2021-08-05 | 2 |
| zhaoliu | 2021-08-06 | 3 |
| zhaoliu | 2021-08-07 | 4 |
| zhaoliu | 2021-08-08 | 5 |
| zhaoliu | 2021-08-09 | 6 |
| zhaoliu | 2021-08-10 | 7 |
| wangwu | 2021-08-04 | 1 |
| wangwu | 2021-08-07 | 2 |
+-----------+-------------+--------------+
- 基于上表计算 dt 和 rank_mid_dt 日期差
select
mid, dt, rank_mid_dt,
date_sub(dt, rank_mid_dt) date_diff
from (select mid, dt, rank() over(partition by mid order by dt) rank_mid_dt from tmp_table.tmp_login_test where dt >= date_sub('2021-08-10', 6) and dt <= '2021-08-10') t1;
+-----------+-------------+--------------+-------------+
| mid | dt | rank_mid_dt | date_diff |
+-----------+-------------+--------------+-------------+
| lisi | 2021-08-04 | 1 | 2021-08-03 |
| lisi | 2021-08-05 | 2 | 2021-08-03 |
| lisi | 2021-08-08 | 3 | 2021-08-05 |
| lisi | 2021-08-10 | 4 | 2021-08-06 |
| zhangsan | 2021-08-04 | 1 | 2021-08-03 |
| zhangsan | 2021-08-05 | 2 | 2021-08-03 |
| zhangsan | 2021-08-06 | 3 | 2021-08-03 |
| zhangsan | 2021-08-08 | 4 | 2021-08-04 |
| zhangsan | 2021-08-09 | 5 | 2021-08-04 |
| zhangsan | 2021-08-10 | 6 | 2021-08-04 |
| zhaoliu | 2021-08-04 | 1 | 2021-08-03 |
| zhaoliu | 2021-08-05 | 2 | 2021-08-03 |
| zhaoliu | 2021-08-06 | 3 | 2021-08-03 |
| zhaoliu | 2021-08-07 | 4 | 2021-08-03 |
| zhaoliu | 2021-08-08 | 5 | 2021-08-03 |
| zhaoliu | 2021-08-09 | 6 | 2021-08-03 |
| zhaoliu | 2021-08-10 | 7 | 2021-08-03 |
| wangwu | 2021-08-04 | 1 | 2021-08-03 |
| wangwu | 2021-08-07 | 2 | 2021-08-05 |
+-----------+-------------+--------------+-------------+
- 基于 mid,date_diff 分组,统计 dt 的去重数量,并取出数量大于 3 的
select
mid, date_diff,
count(distinct dt) cnt
from (select mid, dt, rank_mid_dt, date_sub(dt, rank_mid_dt) date_diff from (select mid, dt, rank() over(partition by mid order by dt) rank_mid_dt from tmp_table.tmp_login_test where dt >= date_sub('2021-08-10', 6) and dt <= '2021-08-10') t1) t2
group by mid, date_diff having count(distinct dt) >= 3;
+-----------+-------------+------+
| mid | date_diff | cnt |
+-----------+-------------+------+
| zhaoliu | 2021-08-03 | 7 |
| zhangsan | 2021-08-03 | 3 |
| zhangsan | 2021-08-04 | 3 |
+-----------+-------------+------+
- 取出 mid
select
distinct mid
from (select mid, date_diff, count(distinct dt) cnt from (select mid, dt, rank_mid_dt, date_sub(dt, rank_mid_dt) date_diff from (select mid, dt, rank() over(partition by mid order by dt) rank_mid_dt from tmp_table.tmp_login_test where dt >= date_sub('2021-08-10', 6) and dt <= '2021-08-10') t1) t2
group by mid, date_diff having count(distinct dt) >= 3) t3;
+-----------+
| mid |
+-----------+
| zhangsan |
| zhaoliu |
+-----------+
到了这里,关于SQL 经典面试题:统计最近七天连续三天活跃的用户的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!