The internal representation of an interval value [YSQL]

How does YSQL represent an interval value?

Download and install the date-time utilities code.

The code on this page and on its child, Modeling the internal representation and comparing the model with the actual implementation, depends on the code presented in the section User-defined interval utility functions. This is included in the larger code kit that includes all of the reusable code that the overall date-time section describes and uses.

The PostgreSQL documentation, under the table Interval Input, says this:

Internally, interval values are stored as months, days, and seconds. This is done because the number of days in a month varies, and a day can have 23 or 25 hours if a Daylight Savings Time adjustment is involved. The months and days fields are integers while the seconds field can store fractions. Because intervals are usually created from constant strings or timestamp subtraction, this storage method works well in most cases, but can cause unexpected results.

Inspection of the C code of the implementation shows that the mm and dd fields of the [mm, dd, ss] internal implementation tuple are four-byte integers. The ss field is an eight-byte integer that records the value in microseconds.

The reference to Daylight Savings Time is a nod to the critical distinction between clock-time-semantics and calendar-time-semantics. Notice the use of "unexpected". It is better to say that your ability confidently to predict the outcome of interval arithmetic rests on a relatively elaborate mental model. This model has two complementary parts:

  • How the values of the three fields of the [mm, dd, ss] representation of an interval value are computed when an interval value is created. The present "How does YSQL represent an interval value?" section addresses this.

  • The different semantics of these three fields when an interval value is added or subtracted to/from a timestamptz value, a timestamp value, or a time value or when an interval value is created by subtracting one moment (typically a plain timestamp value or a timestamptz_value) from another. This is addressed in the section Interval arithmetic.

As long as you have a robust mental model, then your results will not be unexpected. This section explains the mental model for interval value creation. It enables you to predict what values for months, days, and seconds will be represented internally when you specify an interval value using values for years, months, days, hours, minutes, and seconds. And it enables you to predict what values for years, months, days, hours, minutes, and seconds you will read back from an interval value whose months, days, and seconds values you have managed to predict.

The value recorded by each of the three fields of the representation can be arbitrarily large with respect to the conventions that say, for example, that 25 hours is 1 day and 1 hour. For example, this tuple is allowed: 99 months 700 days 926351.522816 seconds. (Of course, the physical internal representation does impose some limits. See the section interval value limits.)

Note: The internal sixteen-byte format of the internal [mm, dd, ss] representation of an interval value determines the theoretical upper limits on the values of each of the three fields. Other factors determine the actual limits. This is explained in the section interval value limits.

Ad hoc examples

There are no built-in functions or operators that let you display the months, days, and seconds "as is" from the internal representation. Rather, you can display only canonically derived values for years, months, days, hours, minutes, and seconds. The rule for extracting these values from the internal representation is simple and intuitive. It is presented as executable PL/pgSQL in the implementation of the function parameterization (interval_mm_dd_ss_t) in the section User-defined interval utility functions. Briefly, the internal integral months value is displayed as integral years and integral months by taking one year to be 12 months; the internal integral days value is displayed "as is"; and the real number internal seconds is displayed as integral hours, integral minutes, and real number seconds by taking one hour to be 60 minutes and one minute to be 60 seconds.

The section Ad hoc examples of defining interval values provides six examples that give a flavor of the complexity of the rules.

Modeling the internal representation and comparing the model with the actual implementation

The best way to express a statement of the rules that are consistent with the outcomes of the six Ad hoc examples of defining interval values, and any number of other examples that you might try, is to implement an executable simulation and to compare its outputs with the outputs that the actual PostgreSQL, and therefore YSQL, implementations produce.

Note: If you follow the recommendations made below, you can simply skip attempting to understand these tricky rules without sacrificing any useful functionality.

The section Modeling the internal representation and comparing the model with the actual implementation presents this. Here is the algorithm, copied from the body of function interval_mm_dd_ss (interval_parameterization_t):

-- The input values are "p.yy", "p.mm", "p.dd", "p.hh", "p.mi", and "p.ss" — i.e. the
-- conventional parameterization of an "interval" value used by the "::interval" typecast
-- and the "make_interval()" approaches.

-- The output values are "mm_out", "dd_out", and "ss_out" — i.e. the fields of the internal
-- representation tuple.

-- "mm_per_yy", "dd_per_mm", "ss_per_dd", "ss_per_hh", and "ss_per_mi" are constants
-- with the meanings that the mnemonics suggest: the number of months in a year,
-- and so on.
mm_trunc                constant int              not null := trunc(p.mm);
mm_remainder            constant double precision not null := p.mm - mm_trunc::double precision;

-- This is a quirk.
mm_out                  constant int              not null := trunc(p.yy*mm_per_yy) + mm_trunc;

dd_real_from_mm         constant double precision not null := mm_remainder*dd_per_mm;

dd_int_from_mm          constant int              not null := trunc(dd_real_from_mm);
dd_remainder_from_mm    constant double precision not null := dd_real_from_mm - dd_int_from_mm::double precision;

dd_int_from_user        constant int              not null := trunc(p.dd);
dd_remainder_from_user  constant double precision not null := p.dd - dd_int_from_user::double precision;

dd_out                  constant int              not null := dd_int_from_mm + dd_int_from_user;

d_remainder             constant double precision not null := dd_remainder_from_mm + dd_remainder_from_user;

ss_out                  constant double precision not null := d_remainder*ss_per_dd +
                                                              p.hh*ss_per_hh +
                                                              p.mi*ss_per_mi +
                                                              p.ss;

The algorithm is too hard to remember and produces unhelpful outcomes.

Yugabyte staff members have carefully considered the rules that this algorithm expresses. They have the property that when non-integral values are used in the ::interval typecast approach, even a literal that specifies, for example, only months can result in an internal [mm, dd, ss] tuple where each of the fields is non-zero. Try this:

select interval_mm_dd_ss('11.674523 months '::interval)::text;

(The function interval_mm_dd_ss (interval) is defined in the section User-defined interval utility functions. This is the result:

 (11,20,20363.616)

The section Interval arithmetic explains that the semantics is critically different for each of the internal representation's fields. It recommends that you use only interval values where just one of the three fields is non-zero. The section Custom domain types for specializing the native interval functionality shows how to impose this discipline programmatically.

Possible upcoming implementation change

Heads up.

There has been some discussion on the pgsql-general and pgsql-hackers mail lists about the algorithm whose implementation that the function interval() documents. As a result, a patch has been developed for a future version of the PostgreSQL system that makes some subtle changes to the "spill-down" behavior in response to real number input values for years, months, days, hours, and minutes when you use the ::interval typecast approach to construct an interval value. When YugabyteDB adopts this patch, the implementation of the function interval_mm_dd_ss (interval_parameterization_t) will be changed accordingly.

If you follow Yugabyte's recommendation to construct interval values using only integral values for years, months, days, hours, and minutes (or, equivalently, always to use the make_interval() SQL built-in function rather than the ::interval typecast approach), then your application code will not see a behavior change when you move to a version of YugabyteDB that implements this patch. As mentioned above, the section Custom domain types for specializing the native interval functionality shows how to impose this discipline programmatically.