Case study: using aggregate functions to demonstrate the "68–95–99.7" rule

This page documents the preview version (v2.21). Preview includes features under active development and is for development and testing only. For production, use the stable version (v2024.1). To learn more, see Versioning.

This case study shows the use of aggregate functions to demonstrate the so-called "68–95–99.7 rule"—described in this Wikipedia article. This case-study focuses on just one part of the rule:

• 68.27% of the values in a normal distribution lie within one standard deviation each side of the mean.

Populate the test table

The demonstration uses the function `normal_rand()`, brought by the tablefunc extension, to populate the test table:

``````drop table if exists t cascade;
create table t(v double precision primary key);

do \$body\$
declare
no_of_rows constant int              := 1000000;
mean       constant double precision := 0.0;
stddev     constant double precision := 50.0;
begin
insert into t(v)
select normal_rand(no_of_rows, mean, stddev);
end;
\$body\$;
``````

If you choose a value of one million for "no_of_rows", then the two different estimates for the one sigma boundaries reliably produce results that typically differ from each other by less than about 0.1%. If you choose fewer rows, then the variability, and the typical results difference, will be bigger.

Because the demonstration (for convenience) uses a table with a single `double precision` column, "v", this must be the primary key. It's just possible that `normal_rand()` will create some duplicate values. However, this is so very rare that it was never seen while the script was repeated, many times, during the development of this code example. If `insert into t(v)` does fail because of this, just repeat the script by hand.

Create, and execute, the table function "the_6827_rule()"

The table function "the_6827_rule()", below, uses the following approach:

``````drop function if exists the_6827_rule() cascade;
create function the_6827_rule()
returns table (t text)
language plpgsql
as \$body\$
declare
-- FIRST, determine the one sigma boundaries by using avg() and stddev_pop().
measured_avg
constant double precision := (
select avg(v) from t);

measured_sigma
constant double precision := (
select stddev_pop(v) from t);

one_sigma_boundaries_from_avg_and_stddev
constant double precision[] := array[
measured_avg - measured_sigma,
measured_avg + measured_sigma];

-- SECOND, determine the one sigma boundaries by using percentile_cont().
fraction_within_sigma_from_avg
constant double precision :=
0.6827;

expected_fraction_below_avg_minus_sigma
constant double precision :=
(1.0 - fraction_within_sigma_from_avg)/2.0;

expected_fraction_above_avg_plus_sigma
constant double precision :=
expected_fraction_below_avg_minus_sigma +
fraction_within_sigma_from_avg;

fractions constant double precision[] := array[
expected_fraction_below_avg_minus_sigma,
expected_fraction_above_avg_plus_sigma];

one_sigma_boundaries_from_percentile_cont
constant double precision[] := (
select
percentile_cont(fractions)
within group (order by v)
from t);

begin
lpad('from avg() and stddev_pop()', 29, ' ')||

return next;
return next;

for j in 1..2 loop
declare
caption constant text not null :=
case j
when 1 then 'mean - sigma:'
when 2 then 'mean + sigma:'
end;

num constant double precision not null :=
one_sigma_boundaries_from_percentile_cont[j] -
one_sigma_boundaries_from_avg_and_stddev[j];

denom constant double precision not null :=
(one_sigma_boundaries_from_percentile_cont[j] +
one_sigma_boundaries_from_avg_and_stddev[j])/2.0;

ratio constant double precision not null :=
abs((100.0::double precision*num)/denom);
begin
return next;
end;
end loop;
end;
\$body\$;
``````

A table function is used because if `raise info` is used in a procedure or anonymous block, then the output cannot be spooled to a file. Invoke it, for example, like this:

``````\t on
\o report.txt
select t from the_6827_rule();
\o
\t off
``````

Typical result

Here is a typical result (for one million rows):

``````                from avg() and stddev_pop()  from percentile_cont()     ratio
---------------------------  ----------------------  --------
mean - sigma:                    -50.00899               -49.99646    0.025%
mean + sigma:                     49.97396                50.00483    0.062%
``````