Signature and purpose of each window function
The two tables at the end classify the eleven built-in window functions into two groups according to their general common characteristics.
Note: The navigation bar lists these window functions in four functional groups. The members in each group bear a strong family resemblance to each other. The first two groups list functions from the first table below. And the second two groups list functions from the second table.
Aggregate function variants
A few of these also have an aggregate function variant. This can be seen with the
\df meta-command. For example,
df lag shows this:
Result data type | Argument data types | Type ------------------+----------------------------------------+-------- bigint | | window bigint | VARIADIC "any" ORDER BY VARIADIC "any" | agg
This property is marked by "Y" in the column "agg?" in the following tables; a blank in this column means that the entry has only a window function variant.
Functions with both a 'window' and an 'aggregate' variantThe definitive description of the use, as an aggregate function, of a window function that has such a variant, is described within the Aggregate functions major section in the section Within-group hypothetical-set aggregate functions .
The results of all of the window functions depend upon what the window
ORDER BY clause and the optional
PARTITION BY clause say. Though you don't get an error if you omit the window
ORDER BY clause, its omission brings unpredictable and therefore meaningless results.
The results of a few of the window functions are sensitive to what the
frame_clause says. This is marked by "Y" in the column "frame?" in the following tables; a blank in this column means that the entry is not sensitive to what the
Frame_clause-insensitive window functions
All of the window functions listed in the first table, and
lead() from the second table, are insensitive to whatever the
frame_clause might say. You can easily show this by trying a few variants like, for example, this:
-- Counter example. Don't use this for window functions -- that aren't frame_clause-sensitive. range between 1 preceding and 1 following exclude current row
It says "consider only the row immediately before and the row immediately after the current row". You'll see that including this, or any other variant, in the
window_definition brings the identical result to what including only the window
ORDER BY clause brings for each of the window functions that aren't sensitive to the
Frame_clause-sensitive window functions
The names of the other window functions that the second table lists,
last_value(), tell you that the output of each makes obvious sense when the scope within which the specified row is found is the entire window. The results of these three functions certainly are sensitive to the
frame_clause. This is the default for the
-- This specification is used if you omit the frame_clause. -- You probably don't want the results that this specifies. between unbounded preceding and current row
See the section Window function invocation—SQL syntax and semantics. But the default does not specify the entire window. To do this, use this variant:
-- You must specify this explicitly unless you are sure -- that you want a different specification. range between unbounded preceding and unbounded following
Yugabyte recommends, therefore, that you include
frame_clause in the
window_definition that you use when you invoke a window function that is sensitive to the
frame_clause, unless you have one of the very rare use cases where the output that you want is produced by a different
frame_clause's many variants are useful when an aggregate function is invoked using the
OVER clause. The section Using the aggregate function
avg() to compute a moving average shows an example.
Window functions that return an "int" or "double precision" value as a "classifier" of the rank of the row within its window
The only information, in the input row set, that the functions in this group depend upon is the emergent order as determined by the window
ORDER BY clause. The actual order depends in the usual way on what this clause says. With the one exception of
ntile(), these functions have no formal parameters. The
ntile() function has a single, mandatory
int formal parameter. It specifies the number of subsets into which the input row set should be classified. This means that it reflects only the invoker's intention and does not reflect the shape of the input.
||Returns a unique integer for each row in a window, from a dense series that starts with 1, according to the emergent order that the window
||Y||Returns the integer ordinal rank of each row according to the emergent order that the window
||Y||Returns the integer ordinal rank of the distinct value of each row according to what the window
||Y||Returns the percentile rank of each row within the window, with respect to the argument of the
||Y||Returns a value that represents the number of rows with values less than or equal to the current row's value divided by the total number of rows—in other words, the relative position of a value in a set of values. The graph of all values of
||Returns an integer value for each row that maps it to a corresponding percentile. For example, if you wanted to mark the boundaries between the highest-ranking 20% of rows, the next-ranking 20% of rows, and so on, then you would use
Window functions that return columns of another row within the window
The functions in this group depend, in the same way as those in the first group do, upon the emergent order as determined by the window
ORDER BY clause. Each has, at least, a single, mandatory,
anyelement formal parameter that specifies which value to fetch from the designated other row in the window. If you want to fetch values from more than one column, you must combine them into a scalar value. The most obvious way to do this is to list the columns in the constructor for a user-defined "row" type. The
nth_value() function has a mandatory second
int formal parameter that specifies the value of "N" to give "Nth" its meaning. The other functions in this group have just one formal parameter.
||Y||Returns the specified value from the first row, in the specified sort order, in the current window frame. If you specify the
||Y||Returns the specified value from the "Nth" row, in the specified sort order, in the current window frame. The second, mandatory, parameter specifies "N" in "Nth".|
||Y||Returns the specified value from the last row, in the specified sort order, in the current window frame.|
||Returns, for the current row, the designated value from the row in the ordered input set that is "lag_distance " rows before it. The data type of the return value matches that of the input value.
||Returns, for the current row, the designated value from the row in the ordered input set that is "lead_distance" rows after it. The data type of the return value matches that of the input value.