The interval data type [YSQL]

The interval data type

Why does the interval data type exist?

Briefly, and trivially, the interval data type exists because the SQL Standard prescribes it. Of course, it does this for a reason: because the semantics of interval arithmetic is rather special and reflect real-world requirements that arise from the difference between clock-time-semantics and calendar-time-semantics.

The SQL Standard prescribes support for the interval data type

Try this:

drop function if exists f() cascade;

create function f()
  returns table(t text)
  language plpgsql
as $body$
  d1 constant date           := '2021-01-13';
  d2 constant date           := '2021-02-17';

  t1 constant time           := '13:23:17.000000';
  t2 constant time           := '15:37:43.123456';

  ts1 constant timestamp     := '2021-01-13 13:23:17.000000';
  ts2 constant timestamp     := '2021-02-17 15:37:43.123456';

  tstz1 constant timestamptz := '2021-01-13 13:23:17.000000 +04:00';
  tstz2 constant timestamptz := '2021-02-17 15:37:43.123456 -01:00';
  t := 'date:        '||(pg_typeof(d2    - d1   ))::text; return next;
  t := '';                                                return next;
  t := 'time:        '||(pg_typeof(t2    - t1   ))::text; return next;
  t := 'timestamp:   '||(pg_typeof(ts2   - ts1  ))::text; return next;
  t := 'timestamptz: '||(pg_typeof(tstz2 - tstz1))::text; return next;

select t from f();

This is the result:

 date:        integer

 time:        interval
 timestamp:   interval
 timestamptz: interval

Subtraction isn't supported for timetz values—yet another reason not to use that data type.

Subtracting date values produces an int value: the number of days between them. In contrast, subtracting time, timestamp , and timestamptz values produces an interval value. The SQL Standard prescribes this outcome for these newer data types but not for the earlier date data type.

Interval arithmetic semantics

Try this to see the actual interval value that subtracting timestamptz values produces:

    '2020-03-10 13:47:19.7':: timestamp -
    '2020-03-10 12:31:13.5':: timestamp)  ::text as "interval 1",
    '2020-03-10 00:00:00':: timestamp -
    '2020-02-10 00:00:00':: timestamp)    ::text as "interval 2";

This is the result:

 interval 1 | interval 2
 01:16:06.2 | 29 days

The section How does YSQL represent an interval value? explains that this text display is not the visualization of just a scalar number of seconds; rather, an interval value is represented as a three-field [mm, dd, ss] tuple. (The first two fields are integers and the last represents a real number of seconds with microsecond precision.) And it explains the reasoning behind this design. The story is complemented by the examples, and the explanations of what they show, in the section Interval arithmetic.

  • The seconds component is externalized as an integral number of hours, an integral number of minutes, and a real number of seconds.
  • The days component is externalized as an integral number of days.
  • And the months component is externalized as an integral number of years and an integral number of months.

Briefly, the rules for adding or subtracting an interval value to a timestamptz value are different when the value defines a non-zero value for only the seconds component, only the days component, or only the months component. The rule differences are rooted in convention. (The rules are therefore complex when an interval value has more than one non-zero component—so complex that it's very difficult to state requirements that imply such hybrid interval values, and to implement application code that meets such requirements reliably.)

Here is a sufficient example to illustrate the conceptual difficulty. First, try this:

select ('1 day'::interval = '24 hours'::interval)::text;

The result is true. (The implementation of the interval-interval overload of the = operator is explained and discussed in the section Comparing two interval values.)

Now try this:

drop function if exists dd_versus_ss() cascade;

create function dd_versus_ss()
  returns table(x text)
  language plpgsql
as $body$
  set timezone = 'America/Los_Angeles';
    i_1_day     constant interval := '1 day';
    i_24_hours  constant interval := '24 hours';

    -- Just before DST Starts at 02:00 on Sunday 14-Mar-2021.
    t0                constant timestamptz := '2021-03-13 20:00:00';

    t0_plus_1_day     constant timestamptz := t0 + i_1_day;
    t0_plus_24_hours  constant timestamptz := t0 + i_24_hours;
    x := 't0 + ''1 day'':    '||t0_plus_1_day    ::text; return next;
    x := 't0 + ''24 hours'': '||t0_plus_24_hours ::text; return next;

select x from dd_versus_ss();

This is the result:

 t0 + '1 day':    2021-03-14 20:00:00-07
 t0 + '24 hours': 2021-03-14 21:00:00-07

How can it be that, while '1 day' is equal to '24 hours', t0 + '1 day' is not equal to t0 + '24 hours'? The short answer, of course, is that '1 day' is not equal to '24 hours' when interval equality is defined strictly. The native interval-interval overload of the = operator implements only a loose notion of interval equality. You also need a strict interval equality notion. The section The "strict equals" operator shows you how to do this.

In the present contrived but crucial example, t0 is just before the "spring forward" moment in the America/Los_Angeles timezone. And the loosely, but not strictly, equal durations of '1 day' and '24 hours' are both long enough to take you from Pacific Standard Time to Pacific Daylight Savings Time. Bearing in mind the [mm, dd, ss] internal representation, you can immediately see this:

  • The semantics of interval arithmetic is different for the dd field of the internal representation than for the ss field.

This does reflect convention. Are you postponing an appointment by one day? Here you expect the re-scheduled appointment to be at the same time on the next day, whether or not a start or end of Daylight Savings Time intervenes. Or are you making a journey (like a flight) that you know takes twenty-four hours? Here, whether or not a start or end of Daylight Savings Time occurs during the flight crucially affects the arrival time.

A similar contrived test that uses interval values of '1 month' and '30 days' with a starting moment just before the last day of February in a leap year shows this:

  • The semantics of interval arithmetic is different for the mm field than for the dd field.

Try this:

select ('1 month'::interval = '30 days'::interval)::text;

The result is true. Now try this:

drop function if exists mm_versus_dd() cascade;

create function mm_versus_dd()
  returns table(x text)
  language plpgsql
as $body$
  set timezone = 'UTC';
    i_1_month  constant interval := '1 month';
    i_30_days  constant interval := '30 days';

    -- Just before 29-Feb in a leap year.
    t0               constant timestamptz := '2020-02-26 12:00:00';

    t0_plus_30_days  constant timestamptz := t0 + i_30_days;
    t0_plus_1_month  constant timestamptz := t0 + i_1_month;
    x := 't0 + 1 month: '||t0_plus_1_month ::text; return next;
    x := 't0 + 30 days: '||t0_plus_30_days ::text; return next;

select x from mm_versus_dd();

This is the result:

 t0 + 1 month: 2020-03-26 12:00:00+00
 t0 + 30 days: 2020-03-27 12:00:00+00

This outcome, too, does reflect convention. Are you setting up reminders to tell you to water your hardy succulents every month? Here, you simply want a reminder on, say, the 10th of each calendar month. Or are you taking a package tour (starting, say, mid-week) that is advertised to last thirty days? Here, whether or not 29-February in a leap year occurs during the vacation affects when you get back home.

Everything that explains these differing semantics, and the philosophy of why they should differ, is explained in the section Interval arithmetic and its child pages.

Organization of the rest of the interval section

The notions that the account of the interval data type explains are interdependent. The ordering of the following subsections aims to introduce the notions with the minimum dependency on notions yet to be introduced. The account is split into the following main subsections:

See the generic section Typecasting between date-time values and text values for the account of the ways to construct and to read values for all of the date-time data types—including, therefore, the interval data type.