# The bucket allocation scheme

It might seem that the built-in function `width_bucket()` (here in the PostgreSQL documentation) is tailor-made for this task. But there's a snag.

The values in the column t4.dp_score" lie in this so-called closed-closed interval `[0, 100]`—expressed like this in SQL:

``````dp_score is between 0 and 100
``````

Note: This Wikipedia article explains the terminology used to characterize intervals. For example:

The open-closed interval (0,1] means greater than 0 and less than or equal to 1. And the closed-open interval [0,1) means greater than or equal to 0 and less than 1.

The values returned both by `percent_rank()` and by `cume_dist()`, when scaled so that they are percentages, also lie in this same range.

Here is the signature (using sensibly chosen names for the formal parameters) of the `double precision` overload of `width_bucket()`:

``````function width_bucket(
val    in double precision,  -- the value to be allocated
lb     in double precision,  -- the lower bound of the overall closed-open interval
ub     in double precision,  -- the upper bound of the overall closed-open interval
n_vals in int)               -- the required number of buckets
returns int
``````

If you invoke `width_bucket()` to allocate a value `\$1` with `lb = 0`, `ub = 100`, and `n_vals = 10`, like this:

``````width_bucket(\$1, 0, 100, 10)
``````

then the buckets are defined as these closed-open intervals:

``````[ 0,  10)  ->  bucket  1
[10,  20)  ->  bucket  2
[20,  30)  ->  bucket  3
...
[80,  90)  ->  bucket  9
[90, 100)  ->  bucket 10
``````

Any value that is less than 0 is allocated to the lower overflow bucket 0. And any value that is greater than or equal to the value 100 is allocated to the upper overflow bucket 11. Here is an example:

``````do \$body\$
declare
lb     constant double precision := 0;
ub     constant double precision := 100;
n_vals constant int              := 10;
begin
assert width_bucket( -0.0000000001, lb, ub, n_vals) =  0, 'unexpected';
assert width_bucket(  0,            lb, ub, n_vals) =  1, 'unexpected';
assert width_bucket( 99.9999999999, lb, ub, n_vals) = 10, 'unexpected';
assert width_bucket(100,            lb, ub, n_vals) = 11, 'unexpected';
end;
\$body\$;
``````

This means that attention has to be paid to the special case when the value to be allocated is 100 in order that it is allocated to bucket 10, and not to the upper overflow bucket 11. However, this is not the main snag. Rather, it's this:

``````( 0,  10]  ->  bucket  1
(10,  20]  ->  bucket  2
(20,  30]  ->  bucket  3
...
(80,  90]  ->  bucket  9
(90, 100]  ->  bucket 10
``````

This means that attention has to be paid to the special case at the other end of the closed-closed interval of possible input values: when the value to be allocated is 0, it must be allocated to bucket 1, and not to the lower overflow bucket 0.

The consequence is that `width_bucket()` cannot be used as is. This leaves two choices. Each relies on implementing a new function `bucket()` with the same signature as `width_bucket()` but with a different implementation.

See do_assert_bucket_ok.sql for the acceptance test that each must pass.