What is window frame clause?
The Window Frame Clause in SQL allows you to define a specific subset of rows within a window partition on which a window function operates. It provides fine-grained control over which rows are included in the calculation for each row of the result set.
Understanding the Window Frame Clause
When you define a window using the OVER clause in SQL, you can further refine the set of rows considered by a window function (like SUM, AVG, COUNT, etc.) using a Window Frame Clause. This clause determines the 'frame' or 'sliding window' relative to the current row within its partition. It is specified using ROWS, RANGE, or GROUPS.
Frame Units: ROWS, RANGE, GROUPS
1. ROWS: This is the most common and intuitive frame unit. It defines the frame based on a fixed number of rows preceding or following the current row. It's ideal for calculating things like rolling averages over a specific number of previous rows.
SELECT
order_id,
order_date,
amount,
SUM(amount) OVER (
PARTITION BY YEAR(order_date)
ORDER BY order_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS sum_last_3_orders
FROM
Orders;
In the example above, for each row, sum_last_3_orders will sum the amount of the current row and the two preceding rows within the same year partition, ordered by order_date.
2. RANGE: This unit defines the frame based on a range of values relative to the current row's ordering column(s). It considers all rows whose ordering column value falls within the specified range. It's often used for calculations like 'sum of sales within the last 7 days' where the exact number of rows might vary.
SELECT
order_id,
order_date,
amount,
AVG(amount) OVER (
ORDER BY order_date
RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW
) AS avg_last_7_days_sales
FROM
Sales;
Here, avg_last_7_days_sales calculates the average amount for all sales that occurred within the 7 days prior to (and including) the current row's order_date.
3. GROUPS: This unit defines the frame based on logical groups of rows that are peers with respect to the ORDER BY clause. It's less common than ROWS or RANGE and is typically used when you want to include all rows that share the same values in the ORDER BY columns as the current row.
SELECT
department,
salary,
SUM(salary) OVER (
PARTITION BY department
ORDER BY salary
GROUPS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_salary_by_group
FROM
Employees;
Frame Extents: PRECEDING and FOLLOWING
Within the ROWS, RANGE, or GROUPS clause, you specify the start and end of the frame using BETWEEN <start> AND <end>. Common frame extent specifiers include:
- UNBOUNDED PRECEDING: Starts the frame at the very first row of the partition.
- <N> PRECEDING: Starts/ends the frame N rows (for ROWS) or N units (for RANGE) before the current row.
- CURRENT ROW: Refers to the current row itself. Can be used as a start or end point.
- <N> FOLLOWING: Starts/ends the frame N rows (for ROWS) or N units (for RANGE) after the current row.
- UNBOUNDED FOLLOWING: Ends the frame at the very last row of the partition.
Default Window Frame
If you use an ORDER BY clause within your OVER() definition but do not specify a window frame clause, SQL uses a default frame:
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: This is the default ifORDER BYis present. It includes all rows from the start of the partition up to and including the current row, and all peers (rows with the sameORDER BYvalue as the current row).ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: This is the default ifORDER BYis *not* present. The window function operates on the entire partition.
Explicitly defining the window frame clause is crucial for precise analytical queries, allowing you to compute rolling aggregates, cumulative sums, moving averages, and more, exactly as needed.