MySQL基础 开窗函数

目录

mysql语法

数据准备

create table emp (     empno numeric(4) not null,     ename varchar(10),     job varchar(9),     mgr numeric(4),     hiredate datetime,     sal numeric(7, 2),     comm numeric(7, 2),     deptno numeric(2) );  insert into emp values (7369, 'SMITH', 'CLERK', 7902, '1980-12-17', 800, null, 20); insert into emp values (7499, 'ALLEN', 'SALESMAN', 7698, '1981-02-20', 1600, 300, 30); insert into emp values (7521, 'WARD', 'SALESMAN', 7698, '1981-02-22', 1250, 500, 30); insert into emp values (7566, 'JONES', 'MANAGER', 7839, '1981-04-02', 2975, null, 20); insert into emp values (7654, 'MARTIN', 'SALESMAN', 7698, '1981-09-28', 1250, 1400, 30); insert into emp values (7698, 'BLAKE', 'MANAGER', 7839, '1981-05-01', 2850, null, 30); insert into emp values (7782, 'CLARK', 'MANAGER', 7839, '1981-06-09', 2450, null, 10); insert into emp values (7788, 'SCOTT', 'ANALYST', 7566, '1982-12-09', 3000, null, 20); insert into emp values (7839, 'KING', 'PRESIDENT', null, '1981-11-17', 5000, null, 10); insert into emp values (7844, 'TURNER', 'SALESMAN', 7698, '1981-09-08', 1500, 0, 30); insert into emp values (7876, 'ADAMS', 'CLERK', 7788, '1983-01-12', 1100, null, 20); insert into emp values (7900, 'JAMES', 'CLERK', 7698, '1981-12-03', 950, null, 30); insert into emp values (7902, 'FORD', 'ANALYST', 7566, '1981-12-03', 3000, null, 20); insert into emp values (7934, 'MILLER', 'CLERK', 7782, '1982-01-23', 1300, null, 10); 

1.聚合函数(分组函数)

1.聚合统计逻辑

    聚合统计:         group by => 分组             xianyu,<1,a,xc,asd>             lxy,<as,zxf,zxf,qwr,ags>         聚合函数 => 指标             xianyu,4             lxy,5 

2.函数使用

group by =》 分组 聚合函数 =》 指标统计 sun avg max min count   需求:     统计每个部门有多少个人?      查什么?         维度:部门         指标:人数                  select         deptno,         count(1) as cnt         from emp         group by deptno;  解释:     count(1) 【1.代表 先放置一个假数,然后再查询】         【2.理解为按照第几个字段进行查数】     select         select + 函数 => 可以校验函数是否存在 

2.开窗函数

1.语法

窗口函数:     窗口 + 函数     窗口:函数运行时 计算数据集的范围     函数:运行时的函数         1.聚合函数             sun avg max min count         2.内置窗口函数              语法结构:         函数 over([partition by xxx,...] [order by xxx,...])         over() 是以谁进行开窗【table or 数据集】         partition by:以谁进行分组 【group by column】         order by:以谁进行排序【column】 

2.聚合函数:多行数据 按照一定规则 进行聚合 为一行

    sum avg max...     理论上:聚合后的行数 <= 聚合前的行数 【主要是看维度选取 group by 里面的字段】      需求:         既要显示 聚合前的数据 又要显示 聚合后的数据 ?          id name sal   dt        sal_all         1   zs  1000 2022-4     1000         2   ls  2000 2022-4     2000         3   wu  3000 2022-4     3000         4   zs  1000 2022-5     2000         5   ls  2000 2022-5     4000         6   wu  3000 2022-5     6000 

数据:
服务器 每天的启动 次数
linux01,2022-04-15,1
linux01,2022-04-16,5
linux01,2022-04-17,7
linux01,2022-04-18,2
linux01,2022-04-19,3
linux01,2022-04-20,10
linux01,2022-04-21,4

统计累计问题:     创建表         create table window01(             name varchar(50),             dt varchar(20),             cnt int         );     插入数据         insert into window01 values(linux01,2022-04-15,1);         insert into window01 values(linux01,2022-04-16,5);         insert into window01 values(linux01,2022-04-17,7);         insert into window01 values(linux01,2022-04-18,2);         insert into window01 values(linux01,2022-04-19,3);         insert into window01 values(linux01,2022-04-20,10);         insert into window01 values(linux01,2022-04-21,4);           insert into window01 values(linux02,2022-04-18,20);         insert into window01 values(linux02,2022-04-19,30);         insert into window01 values(linux02,2022-04-20,10);         insert into window01 values(linux02,2022-04-21,40);       需求:         每个服务器 每天 累积启动次数         select         name,         dt,         cnt,         sum(cnt) over(partition by name order by dt) as cut_all         from window01;          +---------+------------+------+---------+         | name    | dt         | cnt  | cut_all |         +---------+------------+------+---------+         | linux01 | 2022-04-15 |    1 |       1 |         | linux01 | 2022-04-16 |    5 |       6 |         | linux01 | 2022-04-17 |    7 |      13 |         | linux01 | 2022-04-18 |    2 |      15 |         | linux01 | 2022-04-19 |    3 |      18 |         | linux01 | 2022-04-20 |   10 |      28 |         | linux01 | 2022-04-21 |    4 |      32 |         | linux02 | 2022-04-18 |   20 |      20 |         | linux02 | 2022-04-19 |   30 |      50 |         | linux02 | 2022-04-20 |   10 |      60 |         | linux02 | 2022-04-21 |   40 |     100 |         +---------+------------+------+---------+          1 9 10 11 str 【字典序】         1 10 11 9           * 从1开始,1,2,3,4,5,6,7,8,9          * 从10开始,1,10…19,2,3,4,5,6,7,8,9           * 从20开始,1,10…19,2,20…29,3,4,5,6,7,8,9          * 以此类推,所有的10位数,都插入到与他们十位数位置上相等的个位数后面。 

3.内置窗口函数

窗口大小 xxx between xxx and xxx  参数 (ROWS | RANGE) BETWEEN (UNBOUNDED | [num]) PRECEDING AND ([num] PRECEDING | CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING) (ROWS | RANGE) BETWEEN CURRENT ROW AND (CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING) (ROWS | RANGE) BETWEEN [num] FOLLOWING AND (UNBOUNDED | [num]) FOLLOWING  select name, dt, cnt, sum(cnt) over(partition by name order by dt) as cut_all, -- 无边界 sum(cnt) over(partition by name order by dt rows between unbounded preceding and current row) as cut_all2, -- 前三行 + 当前行 sum(cnt) over(partition by name order by dt rows between 3 preceding and current row) as cut_all3, -- 前三行 + 当前行 + 下一行 sum(cnt) over(partition by name order by dt rows between 3 preceding and 1 following) as cut_all4, -- 上面无边界 + 下面无边界 sum(cnt) over(partition by name order by dt rows between unbounded preceding and UNBOUNDED FOLLOWING) as cut_all5 from window01;  select name, dt, cnt, -- 常规分组排序求加和 sum(cnt) over(partition by name order by dt) as cut_all, -- 整张表对时间排序,然后加和,作用到整张表,理解为18号并列有两条数据 sum(cnt) over(order by dt) as cut_all2, -- 对整张表进行加和 sum(cnt) over() as cut_all3, -- 直接按照名字进分组 sum(cnt) over(partition by name) as cut_all4 from window01 order by dt;  1.partition by 不加 => 作用整张表   数仓顺序     ods不动     union all + group by select ifnull case when     join     group by     grouping sets 【维度组合】 

4.内置窗口函数

1.取值 串行

1.串行
            LAG 【窗口内 向上 第n行的值 当前行向上取一行】                 LAG(column [, N[, default]])                 column => 列名                 n => 取几行                 default => 取不到给默认值             LEAD 【窗口内 向下 第n行的值 当前行向下取一行】              select             name,             dt,             cnt,             sum(cnt) over(partition by name order by dt) as cut_all,             lead(dt,1,9999-99-99) over(partition by name order by dt) as lead_alias,             lead(dt,1,9999-99-99) over(partition by name order by dt) as lag_alias             from window01; 
2.取值
            FIRST_VALUE() : 取分组内排序后 截止到当前行 第一个值             LAST_VALUE():取分组内排序后 截止到当前行 最后一个值              select             name,             dt,             cnt,             first_value(cnt) over(partition by name order by dt) as f_value,             last_value(cnt) over(partition by name order by dt) as l_value             from window01; 

2.排序

分组
            ntile             需求:                 把数据按照某个字段进行排序,把数据分成n份ntile(n)                 如果不能平均分配 优先分配到编号小的里面             select             name,             dt,             cnt,             sum(cnt) over(partition by name order by dt) as cut_all,             -- 平均分成n份,不能平均分,优先把多余的放到最小的里面             ntile(2) over(partition by name order by dt) as n2,             ntile(3) over(partition by name order by dt) as n3             from window01             order by dt; 
排序
            rank : 从1开始,按照排序 相同会重复,名次会留下空位 生成组内的记录编号             row_number: 从1开始,按照排序 生成组内的记录编号             dense_rank:从1开始,按照排序 相同会重复,名次不会留下空位 生成组内的记录编号              select             name,             dt,             cnt,             sum(cnt) over(partition by name order by dt) as cut_all,             rank() over(partition by name order by cnt desc) as rk,             row_number() over(partition by name order by cnt desc) as rw,             dense_rank() over(partition by name order by cnt desc) as d_rk             from window01;