 # Signature and purpose of each aggregate function

The aggregate functions are categorized into four classes:

## General-purpose aggregate functions

The aggregate functions in this class can be invoked in one of two ways:

• Either "ordinarily" on all the rows in a table or in connection with `GROUP BY`, when they return a single value for the set of rows.

In this use, row ordering often doesn't matter. For example, `avg()` has this property. Sometimes, row ordering does matter. For example, the order of grouped values determines the mapping between array index and array value with `array_agg()`.

• Or as a window function with `OVER`.

In this use, where the aggregate function is evaluated for each row in the window, ordering always matters.

Arguably, `avg()` might be better classified as a statistical aggregate function. (It is often used together with `stddev()`.) But, because it is so very commonly used, and used as a canonical example of the "aggregate function" notion, it is classified here as a general-purpose aggregate function.

Function Description
`array_agg()` Returns an array whose elements are the individual values that are aggregated. This is described in full detail the `array_agg()` subsection in the main Array section.
`avg()` Computes the arithmetic mean of a set of summable values by adding them all together and dividing by the number of values. If the set contains nulls, then these are simply ignored—both when computing the sum and when counting the number of values.
`bit_and()` Returns a value that represents the outcome of the applying the two-by-two matrix `AND` rule to each alligned set of bits for the set of `NOT NULL` input values.
`bit_or()` Returns a value that represents the outcome of the applying the two-by-two matrix `OR` rule to each alligned set of bits for the set of `NOT NULL` input values.
`bool_and()` Returns a value that represents the outcome of the applying the two-by-two matrix `AND` rule to the set of `NOT NULL` input boolean values.
`bool_or()` Returns a value that represents the outcome of the applying the two-by-two matrix `OR` rule to the set of `NOT NULL` input boolean values.
`count()` Counts the number of non null values in a set. The data type of the values is of no consequence.
`every()` `every()` is a synonym for `bool_and()`
`jsonb_agg()` This, and `json_agg()` are described in detail the `jsonb_agg()` section in the main JSON section.
`jsonb_object_agg()` This and `json_object_agg()` are described in detail the `jsonb_object_agg()` section in the main JSON section.
`max()` Computes the greatest value among the values in the set using the rule that is used for the particular data type in the ORDER BY clause. nulls are removed before sorting the values.
`min()` Computes the least value among the values in the set using the rule that is used for the particular data type in the ORDER BY clause. nulls are removed before sorting the values.
`string_agg()` Returns a single value produced by concatenating the aggregated values (first argument) separated by a mandatory separator (second argument). The first overload has `text` inputs and returns `text`. The second overload has `bytea` inputs and returns `bytea`.
`sum()` Computes the sum of a set of summable values by adding them all together. If the set contains nulls, then these are simply ignored.
`xmlagg()` This is not supported through Version YB 2.2. See GitHub Issue #1043

## Statistical aggregate functions

The aggregate functions in this class can be invoked in one of two ways:

• Either "ordinarily" on all the rows in a table or in connection with `GROUP BY`, when they return a single value for the set of rows. In this use, row ordering doesn't matter.

• Or as a window function with `OVER`.

In this use, where the aggregate function is evaluated for each row in the window, ordering always matters.

Function Description
`covar_pop()` Returns the so-called covariance, taking the available values to be the entire population.
`covar_samp()` Returns the so-called covariance, taking the available values to be a sample of the population.
`corr()` Returns the so-called correlation coefficient. This measures the extent to which the "y" values are linearly related to the "x" values. A return value of 1.0 indicates perfect correlation.
`regr_avgy()` Returns the average of the first argument for those rows where both arguments are `NOT NULL`.
`regr_avgx()` Returns the average of the second argument for those rows where both arguments are `NOT NULL`.
`regr_count()` Returns the number of rows where both arguments are `NOT NULL`.
`regr_slope()` Returns the slope of the straight line that linear regression analysis has determined best fits the "(y, x)" pairs.
`regr_intercept()` Returns the point at which the straight line that linear regression analysis has determined best fits the "(y, x)" pairs intercepts the "y"-axis.
`regr_r2()` Returns the square of the correlation coefficient, `corr()`.
`regr_syy()` Returns `regr_count(y, x)*var_pop(y)` for `NOT NULL` pairs.
`regr_sxx()` Returns `regr_count(y, x)*var_pop(x)` for `NOT NULL pairs`.
`regr_sxy()` Returns `regr_count(y, x)*covar_pop(y, x)` for `NOT NULL` pairs.
`variance()` The semantics of `variance()` and `var_samp()` are identical.
`var_pop()` Returns the variance of a set of values using the "population" variant of the formula that divides by N, the number of values.
`var_samp()` Returns the variance of a set of values using the "sample" variant of the formula that divides by (N - 1) where N is the number of values.
`stddev()` The semantics of `stddev()` and `stddev_samp()` are identical.
`stddev_pop()` Returns the standard deviation of a set of values using the naïve formula (i.e. the "population" variant) that divides by the number of values, N.
`stddev_samp()` Returns the standard deviation of a set of values using the "sample" variant of the formula that divides by (N - 1) where N is the number of values.

## Within-group ordered-set aggregate functions

These functions are sometimes referred to as “inverse distribution” functions. They can be invoked only with the dedicated `WITHIN GROUP (ORDER BY ...)` syntax. They cannot be invoked as a window function with `OVER`.

Function Description
`mode()` Return the most frequent value of "sort expression". If there's more than one equally-frequent value, then one of these is silently chosen arbitrarily.
`percentile_disc()` Discrete variant. The scalar overload of `percentile_disc()` takes a percentile rank value as input and returns the value, within the specified window, that would produce this. The array overload takes an array of percentile rank values as input and returns the array of values that would produce these.
`percentile_cont()` Continuous variant. The scalar overload of `percentile_cont()` takes a percentile rank value as input and returns the value, within the specified window, that would produce this. The array overload takes an array of percentile rank values as input and returns the array of values that would produce these.

## Within-group hypothetical-set aggregate functions

These functions as invoked, as within-group hypothetical-set aggregate functions, with the dedicated `WITHIN GROUP (ORDER BY ...)` syntax. Further, each of the functions listed in this class is associated with a window function of the same name. (But not every window function can be invoked in this way.) For each, the result is the value that the associated window function would have returned for the “hypothetical” row constructed, as the invocation syntax specifies, as if such a row had been added to the window.

Function Description
`rank()` Returns the integer ordinal rank of each row according to the emergent order that the window `ORDER BY` clause specifies. The series of values starts with 1 but, when the window contains ties, the series is not dense. See the account of rank() in the Window functions section for more information.
`dense_rank()` Returns the integer ordinal rank of the distinct value of each row according to what the window `ORDER` BY clause specifies. The series of values starts with 1 and, even when the window contains ties, the series is dense. See the account of dense_rank() in the Window functions section for more information.
`percent_rank()` Returns the percentile rank of each row within the window, with respect to the argument of the window_definition's window `ORDER BY` clause. See the account of percent_rank() in the Window functions section for more information.
`cume_dist()` Returns a value that represents the number of rows with values less than or equal to the current row’s value divided by the total number of rows—in other words, the relative position of a value in a set of values. See the account of cume_dist() in the Window functions section for more information.