The "array foreach loop"

This page documents the preview version (v2.23). Preview includes features under active development and is for development and testing only. For production, use the stable version (v2024.1). To learn more, see Versioning.

One-dimensional arrays are the most commonly used—and the use of even two-dimensional arrays is relatively rare. The use of arrays whose dimensionality is greater than two seems to be rarer still.

"Array foreach loop" with a one-dimensional array

It's easy to define the semantics of the array foreach loop for a one-dimensional array because the only semantically meaningful spelling of the slice clause for such an array is slice 0. (Notice that the argument of the slice clause must be a non-negative int literal; a negative int literal causes the 42601 syntax error.) Moreover the meaning when the optional slice clause is omitted is defined to be the same as slice 0. Try this:

\c :db :u
drop schema if exists s cascade;
create schema s;

create function s.f(slice_literal in int)
  returns table(z text)
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  arr constant text[] not null := array['red', 'green', 'blue'];
  element      text[] not null := '{}';
  n            int    not null := 0;
begin
  case slice_literal
    when 0 then
      <<"slice-0">>foreach z slice 0 in array arr loop
        return next;
      end loop "slice-0";
    when 1 then
      n := 0;
      <<"slice-1">>foreach element slice 1 in array arr loop
        n := n + 1;
      end loop "slice-1";
      assert (n = 1) and (element = arr);
      z := 'Performed just one iteration. The loop variable is set to the entire array.'; return next;
    when 2 then
      declare
        msg text not null := '';
      begin
        <<"slice-2">>foreach element slice 2 in array arr loop
          null;
        end loop "slice-2";
    exception when array_subscript_error then
      get stacked diagnostics msg := message_text;
      z := 'Caught: "'||msg||'".'; return next;
    end;
  end case;
end;
$body$;

Notice that he loop variable for an array foreach loop must be explicitly declared.

First invoke it to ask for slice 0:

select s.f(0);

This is the result:

 red
 green
 blue

This result is self-evidently useful.

Next invoke it to ask for slice 1:

select s.f(1);

This is the result:

 Performed just one iteration. The loop variable is set to the entire array.

This result shows that slice 1 with a one-dimensional array is pointless.

Finally invoke it to ask for slice 2:

select s.f(2);

This is the result:

 Caught: "slice dimension (2) is out of the valid range 0..1"

These three results, taken together, show that the only meaningful use of the array foreach loop with a one-dimensional array is most compactly written by omitting the slice clause, thus:

create function s.f_slice_omitted()
  returns table(z text)
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  arr constant text[] not null := array['red', 'green', 'blue'];
begin
  <<"slice-0">>foreach z slice 0 in array arr loop
    return next;
  end loop "slice-0";
end;
$body$;

select s.f_slice_omitted();

The result is identical to that shown for select s.f(0) shown above. It's easy to define the semantics:

  • The array foreach loop with a one-dimensional array is normally written without the slice clause. The loop variable is set, successively, to each element in the array in storage order.

"Array foreach loop" with a multidimensional array

A useful demonstration of the semantics of the slice clause requires a multidimensional array. Three dimensions are sufficient. Full generality is demonstrated by specifying values greater than one for each of the three lower bounds. Create the function s.three_dim_array() to return such an array:

\c :db :u
drop schema if exists s cascade;
create schema s;

create function s.three_dim_array(
    z_lb in int, z_ub in int,
    y_lb in int, y_ub in int,
    x_lb in int, x_ub in int)
  returns text[]
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  -- "z" is the major dimension
  z_len  constant int not null := z_ub - z_lb + 1;
  y_len  constant int not null := y_ub - y_lb + 1;
  x_len  constant int not null := x_ub - x_lb + 1;

  arr text[] not null :=
    array_fill(null::int, array[z_len, y_len, x_len], array[z_lb, y_lb, x_lb]);
begin
  assert array_lower(arr, 1) = z_lb;
  assert array_lower(arr, 2) = y_lb;
  assert array_lower(arr, 3) = x_lb;
  assert array_upper(arr, 1) = z_ub;
  assert array_upper(arr, 2) = y_ub;
  assert array_upper(arr, 3) = x_ub;

  -- Populate the array
  <<over_z>>for z in z_lb..z_ub loop
    <<over_y>>for y in y_lb..y_ub loop
      <<over_x>>for x in x_lb..x_ub loop
        arr[z][y][x] := 'z'||z::text||'-y'||y::text||'-x'||x::text;
      end loop over_x;
    end loop over_y;
  end loop over_z;

  return arr;
end;
$body$;

Notice how the function uses the array_fill() SQL function to set up an empty array with the required bounds and whose elements all have the same, required, data type. And notice how it uses the array_lower and array_upper SQL functions to read the array's bounds.

  • The first index, z, in the expression arr[z][y][x] has bounds given by the expressions array_lower(arr, 1) and array_upper(arr, 1).

  • The second index, y, in the expression arr[z][y][x] has bounds given by the expressions array_lower(arr, 2) and array_upper(arr, 2).

  • The third index, x, in the expression arr[z][y][x] has bounds given by the expressions array_lower(arr, 3) and array_upper(arr, 3).

Before demonstrating the semantics of the array foreach loop's slice clause, it helps to look at the text literal (i.e. the text typecast) for the value that s.three_dim_array() returns for a particular parameterization. Do this:

select s.three_dim_array(
  z_lb=>11, z_ub=>12,
  y_lb=>23, y_ub=>25,
  x_lb=>36, x_ub=>39)::text;

Array literals.

See: the section Creating an array value using a literal; and the subsection Array Input and Output Syntax in the section Arrays in the PostgreSQL documentation.

The result has no line breaks and so, in a screen window of usual width where line-wrap is turned on, it's very hard to read. Readability is enhanced by adding whitespace manually: line breaks; and a conventional indentation style to line up matching curly braces vertically.

The literal starts by representing the loop bounds, using square bracket pairs. The order is from the first dimension through the highest order dimension. Then an = sign separates this from the list of element values.

The element values themselves are, by definition, listed in storage order. By definition, too, a run of values for the index that changes fastest, x, is surrounded by a curly-brace pair. Then, a run of such runs is surrounded by a second curly-brace pair when the index that changes next-fastest, y, changes. Finally, the entire list of element values is surrounded by a third curly-brace pair.

Use the massaged value, now, as the text[] literal input actual for the unnest() SQL function. This serves both to ensure that the massage didn't introduce any typos and to present the original array value in yet another way, thus:

select unnest(
  '

    [11:12][23:25][36:39]=
      {
        {
          {z11-y23-x36,z11-y23-x37,z11-y23-x38,z11-y23-x39},
          {z11-y24-x36,z11-y24-x37,z11-y24-x38,z11-y24-x39},
          {z11-y25-x36,z11-y25-x37,z11-y25-x38,z11-y25-x39}
        },
        {
          {z12-y23-x36,z12-y23-x37,z12-y23-x38,z12-y23-x39},
          {z12-y24-x36,z12-y24-x37,z12-y24-x38,z12-y24-x39},
          {z12-y25-x36,z12-y25-x37,z12-y25-x38,z12-y25-x39}
        }
      }

  '::text[]
  );

This is the result:

 z11-y23-x36
 z11-y23-x37
 z11-y23-x38
 z11-y23-x39
 z11-y24-x36
 z11-y24-x37
 z11-y24-x38
 z11-y24-x39
 z11-y25-x36
 z11-y25-x37
 z11-y25-x38
 z11-y25-x39
 z12-y23-x36
 z12-y23-x37
 z12-y23-x38
 z12-y23-x39
 z12-y24-x36
 z12-y24-x37
 z12-y24-x38
 z12-y24-x39
 z12-y25-x36
 z12-y25-x37
 z12-y25-x38
 z12-y25-x39

You can, of course, produce the same result directly like this:

select unnest(s.three_dim_array(
                                 z_lb=>11, z_ub=>12,
                                 y_lb=>23, y_ub=>25,
                                 x_lb=>36, x_ub=>39)
                               );

The unnest() function, by definition, creates a single-column table whose rows are the input array's element values in storage order.

Notice how the array's elements are addressed in the s.three_dim_array() function:

<<over_z>>for z in z_lb..z_ub loop
  <<over_y>>for y in y_lb..y_ub loop
    <<over_x>>for x in x_lb..x_ub loop
      arr[z][y][x] := 'z'||z::text||'-y'||y::text||'-x'||x::text;
    end loop over_x;
  end loop over_y;
end loop over_z;

You can picture each element, arr[z][y][x], as a cube in a cuboid with the x-axis running from left to right, the y-axis running from bottom to top, and the z-axis running inwards from the x-y origin—like the thumb, first finger, and index finger of your left hand, thus:

imdb-erd

You can use the exact same loop design to display the elements using a table function thus:

<<over_z>>for z in z_lb..z_ub loop
  <<over_y>>for y in y_lb..y_ub loop
    t := '';
    <<over_x>>for x in x_lb..x_ub loop
      row_major_order_traversal := row_major_order_traversal||' '||arr[z][y][x];
      t := t||arr[z][y][x]||'  ';
    end loop over_x;
    return next;
  end loop over_y;
  t := ''; return next;
end loop over_z;

The complete table function, s.without_slice_operator(), is shown below. The innermost loop, over_x, concatenates the space-separated element values along the x-axis. The surrounding loop, over_y, displays each such concatenation of x values on its own line. And the outermost loop, over_z, introduces a bank line between each set of x values, one for each y value, for a particular z value, and the corresponding set for the next z value. The nested loops produce this output:

 z11-y23-x36  z11-y23-x37  z11-y23-x38  z11-y23-x39
 z11-y24-x36  z11-y24-x37  z11-y24-x38  z11-y24-x39
 z11-y25-x36  z11-y25-x37  z11-y25-x38  z11-y25-x39

 z12-y23-x36  z12-y23-x37  z12-y23-x38  z12-y23-x39
 z12-y24-x36  z12-y24-x37  z12-y24-x38  z12-y24-x39
 z12-y25-x36  z12-y25-x37  z12-y25-x38  z12-y25-x39

This is the same order, the physical storage order, yet again.

This traversal order is called row-major order. Notice that, for the text literal, the elements are shown in exactly this order. By definition, it simply traverses the elements in order of their physical storage and accesses the associated metadata that specifies the array bounds:

  • The elements along the x-axis are stored contiguously in order of increasing values of x.
  • Each such x-axis run is stored contiguously with the next such run in order of increasing values of y.
  • Each run of x values over all the y values (i.e. each x-y plane) is stored contiguously with the next such plane in order of increasing values of z.

Here is the complete definition of the s.without_slice_operator() function. Not only does it display the element values from the nested loop, (over_x within over_y within over_z); it also adds these sanity tests:

  • It concatenates the space-separated element values from the nested loops traversal into the single text value, row_major_order_traversal.
  • It strips off the array bounds representation from the start of the text literal, arr_to_text_typecast; it removes all the curly braces; it removes all the commas; it reduces each run of spaces to just a single space; and it strips off any leading spaces.
  • It concatenates the space-separated element values from the unnest() result as the single text value, unnest_traversal; and it strips off any leading spaces.
  • It does a foreach traversal of the array and concatenates the space-separated element values that this produces into a single text value foreach_traversal; and it strips off any leading spaces.
  • Finally, it asserts that all of the text values, row_major_order_traversal, arr_to_text_typecast, unnest_traversal, and foreach_traversal, are identical to each other.

Create and execute the table function, s.without_slice_operator(), thus:

drop function if exists s.without_slice_operator() cascade;
create function s.without_slice_operator()
  returns table(t text)
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  --                                                   z       y       x
                                                    ------  ------  ------
  arr constant text[] not null := s.three_dim_array(11, 12, 23, 25, 36, 39);

  z_lb constant int not null := array_lower(arr, 1);
  z_ub constant int not null := array_upper(arr, 1);
  y_lb constant int not null := array_lower(arr, 2);
  y_ub constant int not null := array_upper(arr, 2);
  x_lb constant int not null := array_lower(arr, 3);
  x_ub constant int not null := array_upper(arr, 3);

  arr_to_text_typecast       text not null := arr::text;
  row_major_order_traversal  text not null := '';
  unnest_traversal           text not null := '';
  foreach_traversal          text not null := '';
  element                    text not null := '';
begin
  <<over_z>>for z in z_lb..z_ub loop
    <<over_y>>for y in y_lb..y_ub loop
      t := '';
      <<over_x>>for x in x_lb..x_ub loop
        row_major_order_traversal := row_major_order_traversal||' '||arr[z][y][x];
        t := t||arr[z][y][x]||'  ';
      end loop over_x;
      return next;
    end loop over_y;
    t := ''; return next;
  end loop over_z;

  declare
    old_len int not null := -1;

    -- The array litteral starts with the bounds like this: [11:12][23:25][36:39]=
    -- so need a regular expression to represent the general form of this.
    digits      constant text not null := '[0-9]+';
    subpattern  constant text not null := '\['||digits||':'||digits||'\]';
    pattern     constant text not null := subpattern||subpattern||subpattern||'=';
  begin
    arr_to_text_typecast := regexp_replace(arr_to_text_typecast, pattern, '');
    while (length(arr_to_text_typecast) <> old_len) loop
      arr_to_text_typecast := replace(arr_to_text_typecast, '{',  '');
      arr_to_text_typecast := replace(arr_to_text_typecast, '}',  '');
      arr_to_text_typecast := replace(arr_to_text_typecast, ',',  ' ');
      arr_to_text_typecast := replace(arr_to_text_typecast, '  ', ' ');
      old_len := length(arr_to_text_typecast);
    end loop;
  end;

  for element in (select unnest(arr)) loop
    unnest_traversal := unnest_traversal||' '||element;
  end loop;

  foreach element in array arr loop
    foreach_traversal := foreach_traversal||' '||element;
  end loop;

  row_major_order_traversal := ltrim(row_major_order_traversal);
  arr_to_text_typecast      := ltrim(arr_to_text_typecast);
  unnest_traversal          := ltrim(unnest_traversal);
  foreach_traversal         := ltrim(foreach_traversal);

  assert arr_to_text_typecast = row_major_order_traversal;
  assert unnest_traversal     = row_major_order_traversal;
  assert foreach_traversal    = row_major_order_traversal;
end;
$body$;

select s.without_slice_operator();

The result has already been shown above.

Summary

  • The array foreach loop, when the argument is an array, arr, of any dimensionality and when the slice clause is omitted (or when the slice 0 clause is used):

    foreach element in array arr loop
      ...
    end loop;
    

    simply produces each successive array element, with each successive iteration, in physical storage order—a.k.a. row-major order.

  • This query for loop has exactly the same effect:

    for element in (select unnest(arr)) loop
      ...
    end loop;
    
  • The text typecast of an array, arr::text, also lists the element values in physical storage order.

  • Using a three dimensional array as an example, an explicit traversal that uses an over_x loop inside an over_y loop inside an over_z loop. where the elements are addressed using arr[z][y][x], also lists the element values in physical storage order.