问题1:计算2021年里有播放记录的每个视频的完播率(结果保留三位小数),并按完播率降序排序
注:视频完播率是指完成播放次数占总播放次数的比例。简单起见,结束观看时间与开始播放时间的差>=视频时长时,视为完成播放。
DROP TABLE IF EXISTS tb_user_video_log, tb_video_info;
CREATE TABLE tb_user_video_log (
id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
uid INT NOT NULL COMMENT '用户ID',
video_id INT NOT NULL COMMENT '视频ID',
start_time datetime COMMENT '开始观看时间',
end_time datetime COMMENT '结束观看时间',
if_follow TINYINT COMMENT '是否关注',
if_like TINYINT COMMENT '是否点赞',
if_retweet TINYINT COMMENT '是否转发',
comment_id INT COMMENT '评论ID'
) CHARACTER SET utf8 COLLATE utf8_bin;
CREATE TABLE tb_video_info (
id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
video_id INT UNIQUE NOT NULL COMMENT '视频ID',
author INT NOT NULL COMMENT '创作者ID',
tag VARCHAR(16) NOT NULL COMMENT '类别标签',
duration INT NOT NULL COMMENT '视频时长(秒数)',
release_time datetime NOT NULL COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_bin;
INSERT INTO tb_user_video_log(uid, video_id, start_time, end_time, if_follow, if_like, if_retweet, comment_id) VALUES
(101, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:30', 0, 1, 1, null),
(102, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:24', 0, 0, 1, null),
(103, 2001, '2021-10-01 11:00:00', '2021-10-01 11:00:34', 0, 1, 0, 1732526),
(101, 2002, '2021-09-01 10:00:00', '2021-09-01 10:00:42', 1, 0, 1, null),
(102, 2002, '2021-10-01 11:00:00', '2021-10-01 11:00:30', 1, 0, 1, null);
INSERT INTO tb_video_info(video_id, author, tag, duration, release_time) VALUES
(2001, 901, '影视', 30, '2021-01-01 7:00:00'),
(2002, 901, '美食', 60, '2021-01-01 7:00:00'),
(2003, 902, '旅游', 90, '2021-01-01 7:00:00');
答案:
select a.video_id,round(sum(if(end_time-start_time>=duration,1,0))/count(start_time),3) as avg_comp_play_rate
from tb_user_video_log a left join tb_video_info b on a.video_id=b.video_id
where year(start_time)=2021
group by a.video_id
order by avg_comp_play_rate desc
问题2:计算各类视频的平均播放进度,将进度大于60%的类别输出。
注:
播放进度=播放时长÷视频时长*100%,当播放时长大于视频时长时,播放进度均记为100%。
结果保留两位小数,并按播放进度倒序排序。
SELECT tag,
concat(round(avg(case
WHEN timestampdiff(second,start_time,end_time) >= duration THEN
1
ELSE (timestampdiff(second,start_time,end_time)/duration) end)*100,2),'%') AS avg_play_progress
FROM tb_user_video_log a
LEFT JOIN tb_video_info b
ON a.video_id=b.video_id
GROUP BY tag
HAVING substring_index(avg_play_progress,'%',1) > 60
ORDER BY avg_play_progress DESC
问题3:统计在有用户互动的最近一个月(按包含当天在内的近30天算,比如10月31日的近30天为10.2~10.31之间的数据)中,每类视频的转发量和转发率(保留3位小数)。
注:转发率=转发量÷播放量。结果按转发率降序排序。
SELECT tag,
sum(if_retweet) AS retweet_cut,
round(sum(if_retweet)/count(*),
3) AS retweet_rate
FROM tb_user_video_log a
LEFT JOIN tb_video_info b
ON a.video_id=b.video_id
WHERE datediff(date(
(SELECT max(start_time)
FROM tb_user_video_log)),date(start_time))<=29
GROUP BY tag
问题4:计算2021年里每个创作者每月的涨粉率及截止当月的总粉丝量
注:
涨粉率=(加粉量 - 掉粉量) / 播放量。结果按创作者ID、总粉丝量升序排序。
if_follow-是否关注为1表示用户观看视频中关注了视频创作者,为0表示此次互动前后关注状态未发生变化,为2表示本次观看过程中取消了关注。
SELECT author,
date_format(start_time,'%Y-%m') AS month ,
round(sum(case WHEN if_follow=2 THEN -1 ELSE if_follow end)/count(author),3) AS fans_growth_rate ,
sum(sum(case WHEN if_follow=2 THEN -1 ELSE if_follow end))over(partition by author ORDER BY date_format(start_time,'%Y-%m')) total_fans
FROM tb_user_video_log a
LEFT JOIN tb_video_info b
ON a.video_id=b.video_id
WHERE year(start_time)=2021
GROUP BY author,month
ORDER BY author,total_fans
*问题5:统计2021年国庆头3天每类视频
每天的近一周总点赞量和一周内最大单天转发量
,结果按视频类别降序、日期升序排序。假设数据库中数据足够多,至少每个类别下国庆头3天及之前一周的每天都有播放记录。*
SELECT *
FROM
(SELECT tag,
dt,
sum(like_cnt)
OVER w AS sum_like_cnt_7d, max(retweet_cnt)
OVER w AS max_retweet_cnt_7d
FROM
(SELECT tag,
date(start_time) dt,
sum(if_like) like_cnt,
sum(if_retweet) retweet_cnt
FROM tb_user_video_log a
LEFT JOIN tb_video_info b
ON a.video_id=b.video_id
GROUP BY 1,2)t1
#窗口函数 窗口范围:当前行和后六行
WINDOW w AS (partition by tag ORDER BY dt DESC rows BETWEEN current row
AND 6 following)) t2
GROUP BY 1,2
HAVING dt
BETWEEN '2021-10-01'
AND '2021-10-03'
ORDER BY 1 desc, 2 asc
问题6:找出近一个月发布的视频中热度最高的top3视频。
注:
热度=(a视频完播率+b点赞数+c评论数+d转发数)*新鲜度;
新鲜度=1/(最近无播放天数+1);
当前配置的参数a,b,c,d分别为100、5、3、2。
最近播放日期以end_time-结束观看时间为准,假设为T,则最近一个月按[T-29, T]闭区间统计。
结果中热度保留为整数,并按热度降序排序。文章来源:https://www.toymoban.com/news/detail-421026.html
select
video_id,
round((100*avg_comp_play_rate+5*like_cnt+3*comment_cnt+2*retweet_cnt)*(1/(unfinished_day_cnt+1))) hot_index
from(
select video_id,
sum(case when timestampdiff(second,start_time,end_time)>=duration then 1 else 0 end)/count(start_time) as avg_comp_play_rate,
sum(if_like) like_cnt,
sum(if_retweet) retweet_cnt,
sum(if(comment_id is not null,1,0)) comment_cnt,
datediff(date((select max(end_time) from tb_user_video_log)),max(date(end_time))) unfinished_day_cnt
from tb_video_info a join tb_user_video_log b using(video_id)
where datediff(date((select max(end_time) from tb_user_video_log)),date(release_time))<=29
group by 1
) t1 group by 1 order by hot_index desc
limit 3
文章来源地址https://www.toymoban.com/news/detail-421026.html
到了这里,关于数仓经典案例-各个视频平均完播率、平均播放进度的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!