Hive 窗口函数 实现原理

Hive中窗口函数的实现,主要借助于一个叫做 **WindowingTableFunction** 的 **Partitioned Table Function** 。 **Partitioned Table Function** PTF函数的典型结构如下图所示,其**输入**可以是:表、子查询或另一个PTF函数输出;其**输出**也是一张表。 ![v2aded36b40d3911171d698f66744a9dc9_720w.jpg](https://cos.easydoc.net/17082933/files/keqyuozn.jpg) ```sql select channel, month, sum(amount), denserank() over (partition by channel order by sum(amount) desc) as dr, rank() over(partition by channel order by sum(amount)desc) as r from sales group by channel, month; ``` 考虑以上代码,在Hive中具体实现主要有**两个阶段**: - 计算除窗口函数以外所有的其他运算,如:group by,join ,having等。上面的代码的第一阶段即为:`select channel, month, sum(amount) as s from sales group by channel, month;` - 将上一步的输出作为 WindowingTableFunction 函数的输入,计算对应的窗口函数值。上面代码的第一阶段即为: ```sql select channel, month, s,dr,r from WindowingTableFunction( -- 上一阶段的输出 <select channel, month, sum(amount) as s from sales group by channel, month>, -- 窗口函数的分区list partition by channel, -- 窗口函数的order list order by s, -- 窗口函数调用 [r:<rank()>, dr:<denserank()>] ) ``` --- 转载:[Hive 窗口函数 实现原理](https://zhuanlan.zhihu.com/p/97351442)