SQL函数 - 开窗(窗口)函数-Toy模板网

这篇具有很好参考价值的文章主要介绍了SQL函数 - 开窗(窗口)函数。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

什么是开窗函数？

开窗函数对一组值进行操作，它不像普通聚合函数那样需要使用GROUP BY子句对数据进行分组，能够在同一行中同时返回基础行的列和聚合列

开窗函数的语法形式为：函数 + over(partition by <分组用列> order by <排序用列>)，表示对数据集按照分组用列进行分区，并且并且对每个分区按照函数聚合计算，最终将计算结果按照排序用列排序后返回到该行。括号中的两个关键词partition by 和order by 可以只出现一个。

注意：开窗函数不会互相干扰，因此在同一个查询语句中可以同时使用多个开窗函数

开窗函数适用于 mysql 8.0以上版本， sql sever 、hive、oracle 等

开窗函数分类

窗口函数大致分为以下几类：

一、排序开窗函数

① row_number() -- 相同值排名顺延，返回结果1、2、3、4

② rank() -- 相同结果排名相同，后续排名不连续，返回结果为 1、2、2、4

③ dense_rank() -- 相同结果排名相同，后续排名顺延，返回结果为 1、2、2、3

④ ntile(n) -- 分组排名，将数据分为n组并返回对应组号1、2......n

具体例子如下：

select grades
,subjects
,results
,row_number() over(partition by grades,subjects order by results desc) as row_numbers
,rank() over(partition by grades,subjects order by results desc) as ranks
,dense_rank() over(partition by grades,subjects order by results desc) as dense_ranks
,ntile(3) over(partition by grades,subjects order by results desc) as ntiles
from test11

表示将数据集按照grades、subjects字段进行分组后，根据对应排序函数并按照results字段降序返回排名，具体结果如下

sql中开窗函数,数据湖,sql,大数据

二、聚合开窗函数

① sum() -- 分组求和

② count() -- 分组求总数

③ min() -- 分组求最小值

④ max() -- 分组求最大值

⑤ avg() --分组求均值

具体例子如下：

select grades
,subjects
,results
,sum(results) over(partition by grades,subjects order by results desc) as sum聚合1
,sum(results) over(partition by grades,subjects) as sum聚合2
,count(results) over(partition by grades,subjects order by results desc) as count聚合1
,count(results) over(partition by grades,subjects) as count聚合2
,min(results) over(partition by grades,subjects order by results desc) as min聚合1
,min(results) over(partition by grades,subjects) as min聚合2
,max(results) over(partition by grades,subjects order by results desc) as max聚合1
,max(results) over(partition by grades,subjects) as max聚合2
,avg(results) over(partition by grades,subjects order by results desc) as avg聚合1
,avg(results) over(partition by grades,subjects) as avg聚合2
from test11

聚合1表示将数据集按照grades、subjects进行分组后，按照results降序排序，将每组中的results依次聚合；

聚合2表示将数据集按照grades、subjects进行分组后，将每组中的results整体聚合。因此 count(results) over(partition by grades,subjects order by results desc) 与 row_number() over(partition by grades,subjects order by results desc) 可以达到同样目的

sql中开窗函数,数据湖,sql,大数据

深挖两者的差异，主要是 order by 默认统计范围是 rows between unbounded preceding and current row，也就是取当前行数据与当前行之前的数据运算。如果在聚合1 order by 条件的后面加上语句：rows between unbounded preceding and unbounded following，也就可以对分组中的所有数据进行运算，可以得到聚合1相同结果。如下

select grades
,subjects
,results
,sum(results) over(partition by grades,subjects order by results desc rows between unbounded preceding and unbounded following) as sum聚合1
,sum(results) over(partition by grades,subjects) as sum聚合2
,count(results) over(partition by grades,subjects order by results desc rows between unbounded preceding and unbounded following) as count聚合1
,count(results) over(partition by grades,subjects) as count聚合2
,min(results) over(partition by grades,subjects order by results desc rows between unbounded preceding and unbounded following) as min聚合1
,min(results) over(partition by grades,subjects) as min聚合2
,max(results) over(partition by grades,subjects order by results desc rows between unbounded preceding and unbounded following) as max聚合1
,max(results) over(partition by grades,subjects) as max聚合2
,avg(results) over(partition by grades,subjects order by results desc rows between unbounded preceding and unbounded following) as avg聚合1
,avg(results) over(partition by grades,subjects) as avg聚合2
from test11

sql中开窗函数,数据湖,sql,大数据

关于这一部分下文详细描述

三、其他开窗函数

① lag(字段名,n,0) -- 移位开窗函数，表示返回向上第n行指定字段对应数据。其中n代表向上偏移n行，0代表若偏移行数超出表范围则返回0也可以改成其他值，若不写则默认null

② lead(字段名,n,0) -- 移位开窗函数，与lag()相反，表示返回向下第n行指定字段对应数据

③ first_value() -- 取分组内排序后，截止到当前行，第一个值

④ last_value() -- 取分组内排序后，截止到当前行，最后一个值

具体例子如下：

#加order by
select grades
,subjects
,results
,lag(results,1,0) over(partition by grades,subjects order by results desc) as lag移位1
,lead(results,1,0) over(partition by grades,subjects order by results desc) as lead移位1
,first_value(results) over(partition by grades,subjects order by results desc) as first_value排序1
,last_value(results) over(partition by grades,subjects order by results desc) as last_value排序1
from test11

加 order by 代表将数据集按照grades、subjects进行分组后，再根据results降序排序，然后根据函数取当前行数据与当前行之前的数据运算。若不加 order by 则是对分组后的数据直接运算

sql中开窗函数,数据湖,sql,大数据

 #不加order by 
select grades
,subjects
,results
,lag(results,1,0) over(partition by grades,subjects) as lag移位2
,lead(results,1,0) over(partition by grades,subjects) as lead移位2
,first_value(results) over(partition by grades,subjects) as first_value排序2
,last_value(results) over(partition by grades,subjects) as last_value排序2
from test11

sql中开窗函数,数据湖,sql,大数据

不加 order by

⑤ ratio_to_report(字段名) over(partition by 字段名) -- 百分比分析函数，rratio_to_report(字段名) 为分子，over(partition by 字段名) 为分母，若分母中partition by 字段名省略则表示占数据集整体百分比。为Oracle数据库函数，mysql不能使用

具体例子如下：

select grades
,subjects
,results
,ratio_to_report(results) over()  百分比函数1
,ratio_to_report(results) over(partition by grades,subjects) 百分比函数2
from test11

开窗函数的定位框架

窗口函数除了经常使用的 partition by <分组用列> order by <排序用列> 外，在order by 后存在可省略的窗口框架 range/rows between x and y ，主要用于对partition by的分组结果做进一步限制，并定位出限制后的运算范围。

其中range表示按照值的范围进行范围的定义，而rows表示按照行的范围进行范围的定义。若order by 后未指定框架，那么默认框架将采用 range unbounded preceding and current row，表示从开窗后的第一行到当前行。

若窗口函数没有order by，也就不存在框架range/rows between x and y。

框架range/rows between x and y 具体x、y可取值见下表：

可取值	含义
unbounded preceding	partition by 分组order by后第一行
unbounded following	partition by 分组order by后最后一行
current row	partition by 分组order by后当前行
n preceding	partition by 分组order by后前n行
n following	partition by 分组order by后后n行

说明：rows between 5 preceding and current row 可缩写为 rows 5 preceding

range 只支持使用 unbounded preceding、 unbounded following、current row

具体例子如下：

select grades
,subjects
,results
,sum(results) over(partition by grades,subjects order by results desc rows between unbounded preceding and current row) as 定位窗口求和1
,sum(results) over(partition by grades,subjects order by results desc range between unbounded preceding and current row) as 定位窗口求和2
from test11

sql中开窗函数,数据湖,sql,大数据