Window function invocation—SQL syntax and semantics

The rules described in this section also govern the invocation of aggregate functions.

The dedicated Aggregate functions section explains that one kind of aggregate function—so-called ordinary aggregate functions, exemplified by avg() and count()—can optionally be invoked using the identical syntax that you use to invoke window functions. That dedicated section has many examples. See also the sections Using the aggregate function avg() to compute a moving average and Using the aggregate function sum() with the OVER clause in the present Window functions main section.

A note on orthography

Notice these three different orthography styles:

  • OVER is a keyword that names a clause. You write such a keyword in a SQL statement.

  • window_definition is the name of a rule within the overall SQL grammar. You never type such a name in a SQL statement. It is written in bold lower case with underscores, as appropriate, between the English words. Because such a rule is always shown as a link, you can jump directly to the rule in the Grammar Diagrams page. This page shows every single one of the SQL rules. It so happens that the window_definition rule starts with the keyword WINDOW and might therefore, according to the context of use, be referred to alternatively as the WINDOW clause.

  • window frame is a pure term of art. It is written in italic lower case with spaces, as appropriate, between the English words. You neither write it in a SQL statement nor use it to look up anything in the Grammar Diagrams page. Because such a term of art is always shown as a link, you can jump directly to its definition within this "Window function invocation—SQL syntax and semantics" page.

Syntax

Reproduced from the SELECT statement section

The following three diagrams, select_start, WINDOW clause, and fn_over_window rule, are reproduced from the section that describes the SELECT statement.

select_start ::= SELECT [ ALL | 
                          DISTINCT [ ON { ( expression [ , ... ] ) } ] ] 
                  [ * | { { expression
                            | fn_over_window
                            | ordinary_aggregate_fn_invocation
                            | within_group_aggregate_fn_invocation } 
                        [ [ AS ] name ] } [ , ... ] ]

window_clause ::= WINDOW { { name AS window_definition } [ , ... ] }

fn_over_window ::= name  ( [ expression [ , ... ] | * ]  
                   [ FILTER ( WHERE boolean_expression ) ] OVER 
                   { window_definition | name }

Definition of the window_definition rule

As promised in the SELECT statement section, this section explains the window_definition rule and its use as the argument of either the OVER keyword or the WINDOW keyword.

A window_definition can be used only at these two syntax spots, within the enclosing syntax of a subquery.

window_definition ::= ( [ name ]  
                      [ PARTITION BY order_expr [ , ... ] ]  
                      [ ORDER BY order_expr [ , ... ] ]  
                      [ frame_clause ] )

window_definition

(namePARTITIONBY,order_exprORDERBY,order_exprframe_clause)

The frame_clause

frame_clause ::= [ { RANGE | ROWS | GROUPS } frame_bounds ] 
                 [ frame_exclusion ]

frame_bounds ::= frame_start | BETWEEN frame_start AND frame_end

frame_start ::= frame_bound

frame_end ::= frame_bound

frame_bound ::= UNBOUNDED PRECEDING
                | offset PRECEDING
                | CURRENT ROW
                | offset FOLLOWING
                | UNBOUNDED FOLLOWING

frame_exclusion ::= EXCLUDE CURRENT ROW
                    | EXCLUDE GROUP
                    | EXCLUDE TIES
                    | EXCLUDE NO OTHERS

frame_clause

RANGEROWSGROUPSframe_boundsframe_exclusion

frame_bounds

frame_startBETWEENframe_startANDframe_end

frame_start

frame_bound

frame_end

frame_bound

frame_bound

UNBOUNDEDPRECEDINGoffsetPRECEDINGCURRENTROWoffsetFOLLOWINGUNBOUNDEDFOLLOWING

frame_exclusion

EXCLUDECURRENTROWEXCLUDEGROUPEXCLUDETIESEXCLUDENOOTHERS

Semantics

The fn_over_window rule

A window function can be invoked only at the syntax spot in a subquery that the diagram for the select_start rule shows. An aggregate function may be invoked in this way as an alternative to its more familiar invocation as a regular SELECT list item in conjunction with the GROUP BY clause. (The invocation of an aggregate function in conjunction with the GROUP BY clause is governed by the ordinary_aggregate_fn_invocation rule or the within_group_aggregate_fn_invocation rule.)

The number, data types, and meanings of a window function's formal parameters are function-specific. The eleven window functions are classified into functional groups, and summarized, in the two tables at the end of the section Signature and purpose of each window function. Each entry links to the formal account of the function which also provides runnable code examples.

Notice that, among the dedicated window functions (as opposed to aggregate functions that may be invoked as window functions), only ntile() takes an argument. Every other dedicated window function is invoked with an empty parentheses pair. Some aggregate functions (like, for example, jsonb_object_agg()) take more than one argument. When an aggregate function is invoke as a window function, the keyword DISTINCT is not allowed within the parenthesized list of arguments. The attempt causes this error:

0A000: DISTINCT is not implemented for window functions

The window_definition rule

The syntax diagram for the window_definition shows that it uses three complementary specifications:

  • The PARTITION BY clause defines the maximal subsets, of what the subquery-level WHERE clause defines, that are operated upon, in turn, by a window function (or by an aggregate function in window mode). Tautologically, this maximal subset is referred to as the window. In the limit, when the PARTITION BY clause is omitted, the maximal subset is identical with what the WHERE clause defines.
  • The window ORDER BY clause defines how the rows are to be ordered within the window.
  • The frame_clause defines a further narrowing of the window, referred to as the window frame. The window frame is anchored to the current row within the window. In the degenerate case, the window frame coincides with the window and is therefore insensitive to the position of the current row.

In summary, the window_definition defines the window as the scope within which a function's meaning (window function or aggregate function in window mode) is defined. The window is then further characterized by the ordering of its rows, the extent of the window frame, and how this moves with the current row.

The FILTER clause

The FILTER clause's WHERE clause has the same syntax and semantics as it does at the regular WHERE clause syntax spot immediately after a subquery's FROM list. Notice that the FILTER clause is legal only for the invocation of an aggregate function. Here is an example:

select
  class,
  k,
  count(*)
    filter(where k%2 = 0)
    over (partition by class)
  as n
from t1
order by class, k;

If you want to run this, then create a data set using the ysqlsh script that table t1 presents.

Using the FILTER clause in the invocation of a window function causes this compilation error:

0A000: FILTER is not implemented for non-aggregate window functions

The PARTITION BY clause

The PARTITION BY clause groups the rows that the subquery defines into windows, which are processed separately by the window function. (This holds, too, when an aggregate function is invoked in this way.) It works similarly to a query-level GROUP BY clause, except that its expressions are always just expressions and cannot be output-column names or numbers. If the PARTITION BY clause is omitted, then all rows are treated as a single window.

The window ORDER BY clause

The window ORDER BY clause determines the order in which the rows of a window are processed by the window function. It works similarly to a query-level ORDER BY clause; but it cannot use output-column names or numbers. If the window ORDER BY clause is omitted, then rows are processed in an unspecified order so that the results of any window function invoked in this way would be unpredictable and therefore meaningless. Aggregation functions invoked in this way might be sensitive to what the window ORDER BY clause says. This will be the case when, for example, the window frame is smaller than the whole window and moves with the current row. The section Using the aggregate function avg() to compute a moving average provides an example.

The frame_clause

The frame_clause has many variants. Only one basic variant is needed in the OVER clause that you use to invoke a window function. The other variants are useful in the OVER clause that you use to invoke an aggregate function. For completeness, those variants are described on this page.

frame_clause semantics for window functions

The frame_clause specifies the set of rows constituting the so-called window frame. In general, this will be a subset of the rows in the current window. Look at the two tables at the end of the section Signature and purpose of each window function.

Use cases where the frame_clause's many other variants are useful arise when an aggregate function is invoked using the OVER clause. One example is given in the section Using the aggregate function avg() to compute a moving average. Another example, that uses count(*), is given in the code that explains the meaning of the percent_rank() function. Otherwise, see the main Aggregate functions section.

frame_clause semantics for aggregate functions

The window frame can be specified in RANGE, ROWS or GROUPS mode; in each case, it runs from the frame_start to the frame_end. If frame_end is omitted, then the end defaults to CURRENT ROW.

A frame_start of UNBOUNDED PRECEDING means that the window frame starts with the first row of the window. Similarly, a frame_end of UNBOUNDED FOLLOWING means that the window frame ends with the last row of the window.

In RANGE or GROUPS mode, a frame_start of CURRENT ROW means that the window frame starts with the first member of the current row's peer group. A peer group is a set of rows that the window ORDER BY clause sorts with the same rank as the current row. And a frame_end of CURRENT ROW means that the window frame ends with the last row in the current row's peer group. In ROWS mode, CURRENT ROW simply means the current row.

For the offset PRECEDING and offset FOLLOWING modes of the frame_start and frame_end clauses, the offset argument must be an expression that doesn't include any variables, aggregate functions, or window functions. The meaning of the offset value depends on the RANGE | ROWS | GROUPS mode:

  • In ROWS mode, the offset value must be a NOT NULL, non-negative integer. This brings the meaning that the window frame starts or ends the specified number of rows before or after the current row.

  • In GROUPS mode, the offset value must again be a NOT NULL, non-negative integer. Here, this brings the meaning that the window frame starts or ends the specified number of peer groups before or after the current row's peer group. Recall that there's always a logical requirement to include a window ORDER BY clause in the window definition that is used to invoke a window function. In GROUPS mode, whatever is your intended use of the window definition, you get this error if it doesn't include a window ORDER BY clause:

    42P20: GROUPS mode requires an ORDER BY clause
    
  • In RANGE mode, these options require that the window ORDER BY clause specify exactly one column. The offset value specifies the maximum difference between the value of that column in the current row and its value in the preceding or following rows of the window frame. The offset expression must yield a value whose data type depends upon that of the ordering column. For numeric ordering columns (like int, double precision, and so on), it is typically of the same data type as the ordering column; but for date-time ordering columns it is an interval. For example, if the ordering column is date or timestamp, you could specify RANGE BETWEEN '1 day' PRECEDING AND '10 days' FOLLOWING. Here too, the offset value must be NOT NULL and non-negative. The meaning of “non-negative” depends on the data type.

In all cases, the distance to the start and end of the window frame is limited by the distance to the start and end of the window, so that for rows near the window boundaries, the window frame might contain fewer rows than elsewhere.

Notice that in both ROWS and GROUPS mode, 0 PRECEDING and 0 FOLLOWING is equivalent to CURRENT ROW. This normally holds in RANGE mode too, for an appropriate meaning of “zero” specific to the data type.

The frame_exclusion clause allows rows around the current row to be excluded from the window frame, even if they would be included according to what the frame_start and frame_end clauses say.

  • EXCLUDE CURRENT ROW excludes the current row from the window frame.
  • EXCLUDE GROUP excludes all the rows in the current row's peer group.
  • EXCLUDE TIES excludes any peers of the current row, but not the current row itself.
  • EXCLUDE NO OTHERS simply specifies explicitly the default behavior of not excluding the current row or its peers.

Omitting the frame_clause is the same as specifying

RANGE UNBOUNDED PRECEDING

and this means the same as

RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

If the window ORDER BY clause is specified, then this default meaning sets the window frame to be all rows from the window start up through the last row in the current row's peer group. And if the window ORDER BY clause is omitted this means that all rows of the window are included in the window frame, because all rows become peers of the current row.

Notes:

  • The frame_start clause cannot be UNBOUNDED FOLLOWING
  • The frame_end clause cannot be UNBOUNDED PRECEDING, and cannot appear before the frame_start clause.

For example RANGE BETWEEN CURRENT ROW AND offset PRECEDING causes this error:

42P20: frame starting from current row cannot have preceding rows

However, ROWS BETWEEN 7 PRECEDING AND 8 PRECEDING is allowed, even though it would never select any rows.

If the FILTER clause is specified, then only the input rows for which it evaluates to true are fed to the window function; other rows are discarded. As noted above, only window aggregate functions invoked using the OVER clause accept a FILTER clause.

Examples

First example

This shows the use of a window function with an OVER clause that directly specifies the window_definition :

select
  ...
  some_window_function(...) over (partition by <column list 1> order by <column list 2>) as a1,
  ...
from ...

Notice that the syntax spot occupied by "some_window_function" may be occupied only by a window function or an aggregate function. See the section Informal overview of function invocation using the OVER clause for runnable examples of this syntax variant.

If any other kind of function, for example "sqrt()", occupies this syntax spot, then it draws this specific compilation error:

42809: OVER specified, but sqrt is not a window function nor an aggregate function

And if any other expression is used at this syntax spot, then it causes this generic compilation error:

42601: syntax error at or near "over"

Second example

This shows the use of two window functions with OVER clauses that each reference the same window_definition that is defined separately in a WINDOW clause.

select
  ...
  window_fn_1(...) over w as a1,
  window_fn_2(...) over w as a2,
  ...
from ...
window w as (
  partition by <column list 1>                              -- PARTITION BY clause
  order by <column list 2>                                  -- ORDER BY clause
  range between unbounded preceding and unbounded following -- frame_clause
  )
...

For a runnable example of this syntax variant, see first_value(),nth_value(), last_value().

Notice that the syntax rules allow both this:

window_fn_1(...) over w as a1

and this:

window_fn_1(...) over (w) as a1

The parentheses around the window's identifier convey no meaning, Yugabyte recommends that you don't use this form because doing so will make anybody who reads your code wonder if it does convey a meaning.

Third example

This shows how a generic window_definition that is defined in a WINDOW clause is specialized in a particular OVER clause that references it.

select
  ...
  (window_fn(...)      over w)                                                             as a1,
  (aggregate_fn_1(...) over (w range between unbounded preceding and unbounded following)) as a2,
  (aggregate_fn_1(...) over (w range between unbounded preceding and current row))         as a3,
  ...
from ...
window w as (partition by <column list 1> order by <column list 2>)
...

Fourth example

This shows how the window_definition specialization technique that the third example showed can be used in successively in the WINDOW clause.

select
  ...
  (window_fn(...)      over w1) as a1,
  (aggregate_fn_1(...) over w2) as a2,
  (aggregate_fn_1(...) over w3) as a3,
  ...
from ...
window
  w1 as (partition by <column list 1> order by <column list 2>),
  w2 as (w1 range between unbounded preceding and unbounded following),
  w3 as (w1 range between unbounded preceding and current row)

For a runnable example of this fourth syntax variant, see Comparing the effect of percent_rank(), cume_dist(), and ntile() on the same input.