Function age() returns integer

This page documents the preview version (v2.21). Preview includes features under active development and is for development and testing only. For production, use the stable version (v2024.1). To learn more, see Versioning.

The semantics of the two-parameter overload of function age(timestamp[tz], timestamp[tz])

This section defines the semantics of the overload of the function age() with two parameters of data type plain timestamp by implementing the defining rules in PL/pgSQL in the function modeled_age(). The rules by which the returned interval value is calculated are the same for the (timestamptz, timestamptz) overload as for the plain (timestamp, timestamp) overload except that, for the with time zone overload, the actual timezone component (whether this is specified implicitly or taken from the session environment) and the sensitivity to the reigning timezone have their usual effect.

function days_pr_month() returns int

It turns out that to match the behaviour of the built-in age(timestamp, timestamp) function, modeled_age(timestamp, timestamp) needs to map a "borrowed" month into a month-specific number of days. (The subsection The semantics of the built-in function age() on this page's parent page introduces the notion of "borrowing".) The function days_pr_month() implements the mapping. Because of the effect of leap years, the function needs both year and month parameters. Create the helper function thus:

drop function if exists days_pr_month(int, int) cascade;

create function days_pr_month(year in int, month in int)
  returns int
  language plpgsql
as $body$
begin
  -- Self-doc. The value of "month" comes from "extract(month from ...)".
  assert (month between 1 and 12), 'days_pr_month: assert failed';

  declare
    m_next  constant int not null := case month
                                       when 12 then 1
                                       else         month + 1
                                     end;

    y_next  constant int not null := case m_next
                                       when 1 then year + 1
                                       else        year
                                     end;
  begin
    -- February needs special treatment 'cos of leap year possibility.
    -- So may as well use the same method for all months.
    return make_date(y_next, m_next, 1) - make_date(year, month, 1);
  end;
end;
$body$;

You can test it with examples like this:

select
  days_pr_month(2011,  2) as "Feb-2011",
  days_pr_month(2012,  2) as "Feb-2012",
  days_pr_month(2013,  4) as "Apr-2013",
  days_pr_month(2016, 12) as "Dec-2016";

This is the result:

 Feb-2011 | Feb-2012 | Apr-2013 | Dec-2016
----------+----------+----------+----------
       28 |       29 |       30 |       31

function modeled_age(timestamp, timestamp) returns interval

Because the PostgreSQL documentation says nothing of value to describe the semantics that the function age(timestamp, timestamp) implements, the function modeled_age(timestamp, timestamp) was developed to a large extent by trial and error—in other words, thorough testing with a huge set of input values was key. (See the subsections that describe the functions modeled_age_vs_age() and random_test_report_for_modeled_age() below.)

The implementation was made simpler by this intuitive, and trivially observable, realization:

age(t1, t2) == -age (t2, t1)

The following intuition (copied from this page's parent page) is key to the scheme:

The function age() extracts the year, month, day, and seconds since midnight for each of the two input moment values. It then subtracts these values pairwise and uses them to create an interval value. But the statement of the semantics must be made more carefully than this to accommodate the fact that the outcomes of the pairwise differences might be negative.

So the first step, done conveniently in the declare section, is to calculate the pairwise differences months, days, and secs naïvely—accepting that the values might come out negative. However, because the input timestamp values are exchanged, if necessary, so that the local variable t_dob (for date-of-birth) is less than or equal to the local variable t_today., it's guaranteed that the naïvely computed value of the years pairwise difference cannot be negative. There is, though, a different quirk to accommodate in the years calculation: the year zero (as a matter of arbitrary, but universal, convention) simply does not exist. This means that the years difference between 1 AD and 1 BC is just one. This explains the use of the case expression in the expression for years.

The executable block statement (from begin through its matching end;) revisits the computation of each of secs, days, and months, in that order, by implementing "borrowing" from the immediately next coarser-grained difference value when the initially computed present value is negative. This is where the conversion from one "borrowed" month to a number of days must accommodate the number of days in the "borrowed" month—given not only which month it is but also, because of the possibility that this is February in a leap year, which year it is. This pinpoints a key question:

  • Which is the "borrowed" month? Is it (a) the one with [yy_dob, mm_dob]; or (b) the one with [yy_today, mm_today]; or (c) maybe the month that precedes or follows either of these?

Empirical testing shows that the "borrowed" month is given by choice (a).

The candidate interval value to return is then evaluated (using the revised values) as:

make_interval(years=>years, months=>months, days=>days, secs=>secs);

The final step is simply to take account of whether the initial test showed that the input timestamp values should be exchanged. If there was no exchange, then the candidate interval value is returned "as is". But if the input timestamp values were exchanged, then the negated candidate interval value is returned.

Create the function thus:

drop function if exists modeled_age(timestamp, timestamp) cascade;

create function modeled_age(t_today_in in timestamp, t_dob_in in timestamp)
  returns interval
  language plpgsql
as $body$
declare
  -- Exchange the inputs for negative age.
  negative_age   constant boolean          not null := (t_today_in - t_dob_in) < make_interval();

  t_today        constant timestamp        not null := case negative_age
                                                         when true then t_dob_in
                                                         else           t_today_in
                                                       end;

  t_dob           constant timestamp       not null := case negative_age
                                                         when true then t_today_in
                                                         else           t_dob_in
                                                       end;

  secs_pr_day    constant double precision not null := 24*60*60;
  mons_pr_year   constant int              not null := 12;

  yy_today       constant int              not null := extract(year  from t_today);
  mm_today       constant int              not null := extract(month from t_today);
  dd_today       constant int              not null := extract(day   from t_today);
  ss_today       constant double precision not null := extract(epoch from t_today::time);

  yy_dob         constant int              not null := extract(year  from t_dob);
  mm_dob         constant int              not null := extract(month from t_dob);
  dd_dob         constant int              not null := extract(day   from t_dob);
  ss_dob         constant double precision not null := extract(epoch from t_dob::time);

  years                   int              not null :=
    case
      -- Special treatment is needed when yy_today and yy_dob span AC/BC
      -- 'cos there's no year zero.
      when yy_today > 0 and yy_dob < 0 then yy_today - yy_dob - 1
      else                                  yy_today - yy_dob
    end;

  months                  int              not null := mm_today - mm_dob;
  days                    int              not null := dd_today - dd_dob;
  secs                    double precision not null := ss_today - ss_dob;
begin
  if secs < 0 then
    secs := secs + secs_pr_day;
    days := days - 1;
  end if;

  if days < 0 then
    days := days + days_pr_month(yy_dob, mm_dob);
    months := months - 1;
  end if;

  if months < 0 then
    months := months + 12;
    years := years - 1;
  end if;

  declare
    age constant interval not null := make_interval(years=>years, months=>months, days=>days, secs=>secs);
  begin
    return case negative_age
      when true then -age
      else            age
    end;
  end;
end;
$body$;

Function modeled_age_vs_age(timestamp, timestamp) returns text

The design of this is straightforward. It evaluates both age() and modeled_age() using the actual input timestamp values. Then it compares them for equality. Notice that it's critical to use the user-defined "strict equals" operator, ==, for a pair of interval values rather than the native equals for this argument pair.

  • The section User-defined interval utility functions presents the code that creates the "strict equals" operator. The recommendation, given at the start of the section, is to download the '.zip' file to create the reusable code that supports the pedagogy of the overall date-time major section and then to execute the kit's "one-click" install script.

  • The section Comparing two interval values explains why you must use the "strict equals" operator.

The function returns a text value that starts with the text typecasts of the actual input timestamp values followed by the text typecast of the result given by the built-in age() function. Only if the function modeled_age_vs_age() gives a result that is not strictly equal to the built-in age() function's result, is the modeled result appended to the returned text value. The idea here is to reduce the visual noise in the output to make it easy to spot when (at least while modeled_age() was under development) the results from the built-in and the modeled functions disagree.

Create the comparison function thus:

drop function if exists modeled_age_vs_age(timestamp, timestamp) cascade;

create function modeled_age_vs_age(t_today in timestamp, t_dob in timestamp)
  returns text
  language plpgsql
as $body$
declare
  input constant text not null := lpad(t_today::text, 31)||' '||lpad(t_dob::text, 31)||' ';
  m constant interval not null := modeled_age(t_today, t_dob);
  a constant interval not null :=         age(t_today, t_dob);
begin
  return
    case (m == a)
      when true then input||lpad(a::text, 42)
      else           input||lpad(a::text, 42)||' ! '||lpad(m::text, 42)
    end;
end;
$body$;

Now exercise the comparison function with a few manually composed input values. The results are easiest to read if you use a table function encapsulation:

drop function if exists manual_test_report_for_modeled_age() cascade;

create function manual_test_report_for_modeled_age()
  returns table(z text)
  language plpgsql
as $body$
begin
  z := lpad('t_today', 31     )||' '||lpad('t_dob', 31     )||' '||lpad('age()', 42     );    return next;
  z := lpad('-',       31, '-')||' '||lpad('-',     31, '-')||' '||lpad('-',     42, '-');    return next;

  -- Sanity test: zero age.
  z := modeled_age_vs_age('2019-12-21',                    '2019-12-21'                   );  return next;

  -- Positive ages.
  z := modeled_age_vs_age('2001-04-10',                    '1957-06-13'                   );  return next;
  z := modeled_age_vs_age('2001-04-10 11:19:17',           '1957-06-13 15:31:42'          );  return next;
  z := modeled_age_vs_age('0007-06-13 15:31:42.123456 BC', '2001-04-10 11:19:17.654321 BC');  return next;

  -- Negative age.
  z := modeled_age_vs_age('1957-06-13 15:31:42',        '2001-04-10 11:19:17'             );  return next;

  -- t_today and t_dob span the BC/AD transition.
  z := modeled_age_vs_age('0001-01-01 11:19:17',        '0001-01-01 15:31:42 BC'          );   return next;
  z := modeled_age_vs_age('0001-01-01 15:31:42 BC',     '0001-01-01 11:19:17'             );   return next;
end;
$body$;

select z from manual_test_report_for_modeled_age();

This is the result:

                         t_today                           t_dob                                      age()
 ------------------------------- ------------------------------- ------------------------------------------
             2019-12-21 00:00:00             2019-12-21 00:00:00                                   00:00:00
             2001-04-10 00:00:00             1957-06-13 00:00:00                    43 years 9 mons 27 days
             2001-04-10 11:19:17             1957-06-13 15:31:42           43 years 9 mons 26 days 19:47:35
   0007-06-13 15:31:42.123456 BC   2001-04-10 11:19:17.654321 BC   1994 years 2 mons 3 days 04:12:24.469135
             1957-06-13 15:31:42             2001-04-10 11:19:17       -43 years -9 mons -26 days -19:47:35
             0001-01-01 11:19:17          0001-01-01 15:31:42 BC                   11 mons 30 days 19:47:35
          0001-01-01 15:31:42 BC             0001-01-01 11:19:17                -11 mons -30 days -19:47:35

There is no fourth results column—in other words, the comparison of the result from the user-defined modeled_age_vs_age(timestamp, timestamp) and the result from the built-in age() passes the strict equals test for each tested pair of inputs.

Notice that the first two "Positive ages" tests use the same values as do the examples on this page's parent page in the section The semantics of the built-in function age()

Function random_timestamp() returns timestamp

Because the implementation of modeled_age(timestamp, timestamp) function was designed using intuition and iterative refinement in response to trial-and-error, it's critically important to test the comparison of the result from this and the result from the built-in age() function with a huge number of distinct input pairs. The only way to do this is to generate these pairs randomly. There is no available suitable random-number generator. But the function gen_random_bytes() comes to the rescue. This is not, strictly speaking, a built-in. Rather, it comes when you install the pgcrypto extension.

Always make the 'pgcrypto' extension centrally available in every database.

Not only does installing the pgcrypto extension bring the function gen_random_bytes(); also, it brings gen_random_uuid(). This is commonly used to populate a surrogate primary key column. Yugabyte therefore recommends that you adopt the practice routinely to install pgcrypto (this must be done by a superuser) in a central "utilities" schema in every database that you create. By granting appropriate privileges and by including this schema in, for example, the second position in every regular user's search path, you can make gen_random_bytes(), gen_random_uuid(), and all sorts of other useful utilities immediately available to all users with no further fuss.

You might also like to install the tablefunc extension as part of your standard set of central utilities. This does bring a random-number generator function, normal_rand(). However, this generates a normally distributed set of double precision values. This functionality isn't appropriate for testing modeled_age(); but it is appropriate for many other testing purposes.

First, you need a helper function to convert the bytea value that gen_random_bytes() returns to a number value. Create and test the helper thus:

drop function if exists bytea_to_num(bytea) cascade;

create function bytea_to_num(b bytea)
  returns numeric
  language plpgsql
as $body$
declare
  n numeric := 0;
begin
  for j in 0..(length(b) - 1) loop
    n := n*256+get_byte(b, j);
  end loop;
  return n;
end;
$body$;

Now create the function random_timestamp(). This invokes make_timestamp() with values from six successive invocations of gen_random_bytes(), using bytea_to_num() to convert the returned values first to numeric values and then to suitably constrained int values for each of the year, month, mday, hour, and min actual arguments and to a suitably constrained double precision value for the sec actual argument. Create it thus:

drop function if exists random_timestamp() cascade;

create function random_timestamp()
  returns timestamp
  language plpgsql
as $body$
declare
  year   constant int       not null := greatest(1::int, mod(bytea_to_num(gen_random_bytes(2))::int, 4700));
  month  constant int       not null := greatest(1::int, mod(bytea_to_num(gen_random_bytes(2))::int,   12));
  mday   constant int       not null := greatest(1::int, mod(bytea_to_num(gen_random_bytes(2))::int,   28));
  hour   constant int       not null :=                  mod(bytea_to_num(gen_random_bytes(2))::int,   23);
  min    constant int       not null :=                  mod(bytea_to_num(gen_random_bytes(2))::int,   59);
  sec    constant numeric   not null :=                  mod(bytea_to_num(gen_random_bytes(3)),        58.987654::numeric);

  ts     constant timestamp not null := make_timestamp(year, month, mday, hour, min, sec::double precision);
begin
  return case
           when (mod(bytea_to_num(gen_random_bytes(3))::int, 2) = 1) then ts
           else                                                           (ts::text||' BC')::timestamp
         end;
end;
$body$;

Test it like this:

select
  random_timestamp() as "ts-1",
  random_timestamp() as "ts-2";

Repeat this select time and again. Each time, you'll see a pair of different values. Sometimes, they both have AD dates; sometimes they both have BC dates; sometimes one has an AD date and one has a BC date; sometimes "ts-1" is earlier than "ts-2"; and sometimes "ts-2" is earlier than "ts-1". The value 4700 was chosen to constrain the year because 4713 BC is the earliest legal timestamp value. (See the table in the Synopsis subsection on the Date and time data types major sections' main page.) It's sufficient for the present testing purpose that the generated timestamp values are between 4700 BC and 4700 AD.

Function random_test_report_for_modeled_age()

This function invokes random_timestamp() to generate two new distinct values and then uses these to invoke modeled_age() and the built-in age(). It compares the values that they return using the == strict equals operator. Only if they differ, does it invoke modeled_age_vs_age() to show the differing values. And if this happens, it notes that at least one difference has been seen. The expectation is that there will be no differences to report so that the final report will show simply "No failures" and therefore be maximally easily understood. As a bonus, the report shows the minimum and the maximum generated values returned by random_timestamp().

Notice how the special manifest constants '-infinity' and 'infinity' are used.

Notice how the special manifest constants '-infinity' and 'infinity' are used to set the starting values for, respectively, max_ts and min_ts. Without this text-book pattern, the loop would need to be coded more elaborately by treating the first iteration as a special case that establishes max_ts and min_ts as the values returned by the invocation of random_timestamp() this time; only then could the second and subsequent iterations be coded using "max_ts := greatest(max_ts, greatest(ts1, ts2));" and "min_ts := least(min_ts, least(ts1, ts2));".

Create the function thus:

drop function if exists random_test_report_for_modeled_age(int) cascade;

create function random_test_report_for_modeled_age(no_of_attempts in int)
  returns table(z text)
  language plpgsql
as $body$
declare
  no_failures boolean     not null := true;
  max_ts      timestamp   not null :=  '-infinity';
  min_ts      timestamp   not null :=   'infinity';
begin
  for j in 1..no_of_attempts loop
    declare
      ts1  constant timestamp not null := random_timestamp();
      ts2  constant timestamp not null := random_timestamp();

      m    constant interval  not null := modeled_age(ts1, ts2);
      a    constant interval  not null :=         age(ts1, ts2);
    begin
      max_ts := greatest(max_ts, greatest(ts1, ts2));
      min_ts :=    least(min_ts,    least(ts1, ts2));

      if m == a then
        null;
      else
        no_failures := false;
        z := modeled_age_vs_age(ts1, ts2);                          return next;
      end if;
    end;
  end loop;
  ----------------------------------------------------------------------------------------

  z := '';                                                          return next;
  z := rpad('-', 120, '-');                                         return next;
  if no_failures then
    z := 'No failures.';                                            return next;
  end if;
  z := 'max_ts: '||max_ts::text||' | min_ts: '||min_ts::text;       return next;
end;
$body$;

Test it first with a modest number of attempts:

select z from random_test_report_for_modeled_age(1000);

Then increase the number of attempts to, say, one million. This takes about a minute. (No thought was given to make the test run faster. Its speed is uninteresting.) You'll see that "No failures"_ is reported and that the range of randomly generated timestamp values spans close to the maximum that random_timestamp() can produce (4700 BC through 4700 AD).

This should give you a very high confidence indeed that the function modeled_age() lives up to its name.

The semantics of the one-parameter moment overload of function age()

The effect of age(t) is identical to the effect of age(<midnight today>, t). Here's a demonstration of the semantics. The expression date_trunc('day', clock_timestamp()) is copied from the definition of today() in the subsection Consider user-defined functions rather than 'today', 'tomorrow', and 'yesterday'.

Do this to test this assertion for the timestamptz overloads:

drop procedure if exists assert_one_parameter_overload_of_age_semantics(timestamptz) cascade;
create procedure assert_one_parameter_overload_of_age_semantics(t in timestamptz)
  language plpgsql
as $body$
declare
  age_1 constant interval not null := age(t);
  age_2 constant interval not null := age(date_trunc('day', clock_timestamp()), t);
begin
 assert age_1 = age_2, 'Assert failed';
end;
$body$;

set timezone = 'UTC';
call assert_one_parameter_overload_of_age_semantics('2007-06-24');
call assert_one_parameter_overload_of_age_semantics('2051-07-19 Europe/Helsinki');
call assert_one_parameter_overload_of_age_semantics('2007-02-01 13:42:19.12345');
call assert_one_parameter_overload_of_age_semantics(clock_timestamp());

set timezone = 'America/Los_Angeles';
call assert_one_parameter_overload_of_age_semantics('2007-06-24');
call assert_one_parameter_overload_of_age_semantics('2051-07-19 Europe/Helsinki');
call assert_one_parameter_overload_of_age_semantics('2007-02-01 13:42:19.12345');
call assert_one_parameter_overload_of_age_semantics(clock_timestamp());

Each call statement finishes without error, showing that the assertion holds for every test

Do this to test this assertion for the plain timestamp overloads:

drop procedure if exists assert_one_parameter_overload_of_age_semantics(timestamp) cascade;
create procedure assert_one_parameter_overload_of_age_semantics(t in timestamp)
  language plpgsql
as $body$
declare
  age_1 constant interval not null := age(t);
  age_2 constant interval not null := age(date_trunc('day', localtimestamp), t);
begin
 assert age_1 = age_2, 'Assert failed';
end;
$body$;

set timezone = 'UTC';
call assert_one_parameter_overload_of_age_semantics('2007-02-01 13:42:19.12345');
call assert_one_parameter_overload_of_age_semantics('2007-06-24');
call assert_one_parameter_overload_of_age_semantics('2051-07-19');
call assert_one_parameter_overload_of_age_semantics(clock_timestamp());

set timezone = 'America/Los_Angeles';
call assert_one_parameter_overload_of_age_semantics('2007-02-01 13:42:19.12345');
call assert_one_parameter_overload_of_age_semantics('2007-06-24');
call assert_one_parameter_overload_of_age_semantics('2051-07-19');
call assert_one_parameter_overload_of_age_semantics(clock_timestamp());

Each call statement finishes without error, showing that the assertion holds for every test.