SQL subprograms (a.k.a. "language sql" subprograms)

Regard the PostgreSQL documentation as the canonical definitional reference on this topic.

Make sure that you read the section Query Language (SQL) Functions in the PostgreSQL documentation.

You can define both functions and procedures using language sql. Each of these lets you encapsulate two or more SQL statements. Notice that a function, or a procedure with an inout formal argument, returns the result of only the last SQL statement. See the section The result is delivered by the last "select" (or "values") statement below. The encapsulation lets you do no more than list independent SQL statements in the order that they are to be executed. This means that there's no way within the language sql encapsulation to inform the next statement that's executed about the outcome of the previous statements. However, using two or more SQL statements in concert typically implies the need for a control structure—and this is what language plpgsql subprograms provide.

Yugabyte recommends that you respect the convention that the introductory section User-defined subprograms outlines:

  • use a function to produce a single value, a record value, or a set of such values
  • use a procedure to do something.

The upshot is that the body of a language sql function will normally be a single select (or values) statement and that the body of a language sql procedure will normally be a single insert, update_, or delete_ statement.

The execution model for 'language sql' subprograms

Because the body of a language sql subprogram is nothing more than a list of SQL statements, the analysis that's done at create time is different from what's done at create time for a language plpgsql subprogram. (See the section PL/pgSQL's execution model.)

  • For a language sql subprogram, at create time, each SQL statement undergoes the same kind of analysis that the prepare statement carries out. In particular, every schema-object that each encapsulated SQL statement references must exist, and have the correct sub-object structure, at create time for the subprogram. Then at run-time, each statement is executed in a way that corresponds to how a prepared statement is executed: actual values brought by the subprogram's arguments are bound to corresponding placeholders and then it is executed. (Notice that there's no evidence of this in what the pg_prepared_statements catalog view lists.)
  • For a language plpgsql subprogram, the body's procedural source code is syntax checked. Further, some (but by no means not all possible) semantic analysis is done. But embedded SQL statements receive no more than a syntax check. Then at run-time, and only when the subprogram is first referenced, its so-called AST (abstract syntax tree) is derived and stored in the session's memory. Therefore, full semantic analysis of embedded SQL statements (followed by the equivalent of prepare, binding of actual values, and execution) happens first at run-time. (Notice that, here too, there's no evidence of this in what the pg_prepared_statements catalog view lists.)

The difference between the two models means that a language plpgsql subprogram can use objects that it creates just before use—while a language sql subprogram can use only objects that already exist at the moment that it's created. Try the following comparison. First, using language plpgsql:

create schema s;

create procedure s.p(v1 in int, v2 in int, r inout int[])
  set search_path = pg_catalog, s, pg_temp
  language plpgsql
as $body$
begin
  create table if not exists pg_temp.t(v int not null) on commit delete rows;
  with c as (
    insert into pg_temp.t(v) values (v1), (v2) returning v)
  select array_agg(v) into r from c;
end;
$body$;

call s.p(17, 19, null::int[]);

The create procedure succeeds and the call produces this result:

    r    
---------
 {17,19}

Now attempt to rewrite it using language sql:

drop table if exists pg_temp.t;
drop procedure if exists s.p(int, int, int[]);

create procedure s.p(v1 in int, v2 in int, r inout int[])
  set search_path = pg_catalog, s, pg_temp
  language sql
as $body$
  create table if not exists pg_temp.t(v int not null) on commit delete rows;
  with c as (
    insert into pg_temp.t(v) values (v1), (v2) returning v)
  select array_agg(v) from c;
$body$;

The create procedure fails with the 42P01 error:

relation "pg_temp.t" does not exist

You can make the error go away by moving the create table statement to top-level and by executing it before you execute the create procedure:

drop table if exists pg_temp.t;
create table pg_temp.t(v int not null) on commit delete rows;

drop procedure if exists s.p(int, int, int[]);
create procedure s.p(v1 in int, v2 in int, r inout int[])
  set search_path = pg_catalog, s, pg_temp
  language sql
as $body$
  with c as (
    insert into pg_temp.t(v) values (v1), (v2) returning v)
  select array_agg(v) from c;
$body$;

The create procedure now runs without error and the call produces the same result as it did for the language plpgsql procedure.

The difference between how language plpgsql subprograms and language sql subprograms are created and executed brings some functional disadvantage to the latter kind of subprogram with respect to the former kind. With the former kind, a common pattern for using a temporary table is to include the create table if not exists statement within the encapsulation of the procedure itself so that it's guaranteed that the table exists when it's first accessed. (A more carefully designed pattern is needed if the temporary table's purpose is to represent session-duration state that several different subprograms share. Here, it will be created using on commit preserve rows. And specific initialization code must ensure that the table is created at session start—or at least before it's first needed.) If a language sql subprogram is to be used, then it inevitably must rely on specific initialization code outside of the subprogram's encapsulation to create it before the subprogram uses it.

A 'language sql' subprogram might need to typecast values where its 'language plpgsql' counterpart does not.

Try this:

create schema s;

create function s.f()
  returns int
  set search_path = pg_catalog, pg_temp
  language sql
as $body$
  select count(*) from pg_class;
$body$;

The create fails with the 42P13 error thus:

return type mismatch in function declared to return integer

and the "detail" says Actual return type is bigint. The \df count meta-command shows that its return data type is indeed bigint. So you must either create the function with returns bigint or use a typecast thus:

drop function if exists s.f() cascade;

create function s.f()
  returns int
  set search_path = pg_catalog, pg_temp
  language sql
as $body$
  select count(*)::int from pg_class;
$body$;

Now the create succeeds without error (and select s.f() produces the expected result).

The language plpgsql counterpart is more forgiving. Try this:

drop function if exists s.f() cascade;

create function s.f()
  returns int
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
begin
  return (select count(*) from pg_class);
end;
$body$;

Here, the create succeeds without error (and select s.f() again produces the expected result).

Using a "language sql" function to produce different kinds of return value

You can use a language sql function to return:

  • a single value—which can be either a single scalar value or a single value of a composite type of arbitrary complexity
  • a value of a single (anonymous) record value—each named field of which can be a single scalar value or a single value of a composite type
  • a set of single values or a set of record values.

Returning a single scalar value

Connect as an ordinary user that can create objects in a test database and do this:

create schema s;
create table s.t(k serial primary key, c1 int not null, c2 int not null);
insert into s.t(c1, c2) values (1, 17), (2, 42), (3, 57), (1, 19);

create function s.f(c1_in in int)
  returns int
  set search_path = pg_catalog, s, pg_temp
  security definer
  language sql
as $body$
  select c2 from t where c1 = c1_in order by c2 desc;
$body$;

The example is contrived to make a point: the function is specified to return a single scalar value. Yet, by looking at the table's content, you can see that the select that the function encapsulates will produce two rows when c1_in = 2. However, the function does not cause an error when invoked with this actual argument—and the outcome is well-defined. The function's result is simply the first result that the select statement defined. (The order by clause brings a well-defined result in the usual way.) Now execute it:

select s.f(1);

This is the result"

 f  
----
 19

Compare this with how the corresponding language plpgsql function behaves:

create function s.f_plpgsql(c1_in in int)
  returns int
  set search_path = pg_catalog, s, pg_temp
  security definer
  language plpgsql
as $body$
begin
  return (select c2 from t where c1 = c1_in order by c2 desc);
end;
$body$;

Now execute it:

select s.f_plpgsql(1);

It causes the 21000 error:

more than one row returned by a subquery used as an expression

The difference in outcome for the language sql function compared to the language plpgsql function that each encapsulates the same select statement is explained thus:

  • The language sql encapsulation, when the header's returns clause specifies a single row, is simply defined to return the first row.
  • But the corresponding language plpgsql encapsulation is designed to return the value that follows the return keyword. And, in the present example, that is a subquery that may, or may not, return a single row.

This implies that you can provoke the same error in ordinary top-level SQL, thus:

select (select c2 from s.t where c1 = 1 order by c2 desc) as result;

The key question that informs the choice between a language sql and a language plpgsql subprogram when the returns argument is to be a single value is your confidence that the code is guaranteed to produce a single value. If the code does no more than evaluate an arithmetic expression, then choosing language sql can be completely safe. But if the result is produced by a SQL statement, then the safety must rely on the existence of a suitable unique index (which is not the case in the present example).

Returning a single record value

The syntax here depends on using the keyword record as the operand of the returns keyword in the function's header together with listing the names and data types of the record's fields as out formal arguments. Try this:

create schema s;
create table s.t(k int primary key, c1 int not null, c2 int not null);
insert into s.t(k, c1, c2) values (1, 3, 17), (2, 5, 42), (3, 7, 57), (4, 9, 19);

create function s.f(k_in in int, c1 out int, c2 out int)
  returns record
  set search_path = pg_catalog, s, pg_temp
  security definer
  language sql
as $body$
  select c1, c2 from t where k = k_in;
$body$;

Test it thus:

select c1, c2 from s.f(3);

This is the result:

 c1 | c2 
----+----
  7 | 57

Compare this approach with defining a function that returns a single composite type value:

create type s.ct as (c1 int, c2 int);

create function s.f2(k_in in int)
  returns s.ct
  set search_path = pg_catalog, s, pg_temp
  security definer
  language sql
as $body$
  select (c1, c2)::s.ct from t where k = k_in;
$body$;

The syntax to test this is identical to what you used to test s.f():

select c1, c2 from s.f2(3);

And it produces the identical result. The choice between the two approaches will be dictated by how the result is to be consumed.

Returning a set of scalar values

The syntax here is a trivial extension of the syntax for returning a single scalar value. Try this:

create schema s0;

create function s0.my_subprogram_names(kinds_in text[] = null)
  returns setof name
  set search_path = pg_catalog, pg_temp
  language sql
as $body$
  with s(name, kind) as (
    select p.proname, p.prokind::text
    from
      pg_proc as p
      inner join pg_roles as r
      on p.proowner = r.oid
    where r.rolname = current_role
    and   p.prokind = any (array['f'::"char", 'p'::"char"]))
  select name
  from s
  where kind = any(
                    case kinds_in
                      when array['f'] then array['f']
                      when array['p'] then array['p']
                      else                       array['f', 'p']
                    end
                  )
  order by name;
$body$;

Now create another function and a procedure simply to give the function s.my_subprograms() more records to return:

create schema s1;
create function s1.f()
  returns int
  set search_path = pg_catalog, pg_temp
  language sql
as $body$
  select 1;
$body$;

create schema s2;
create table s2.t(k serial primary key, v int not null);
create procedure s2.p(v_in in int)
  set search_path = pg_catalog, pg_temp
  security definer
  language sql
as $body$
  insert into s2.t(v) values (v_in);
$body$;

Now test s.my_subprogram_names(). First, list the current role's functions:

select t.v as func_name
from s0.my_subprogram_names(array['f']) as t(v);

This is the result:

      func_name      
---------------------
 f
 my_subprogram_names

Next, list the current role's procedures:

select t.v as proc_name
from s0.my_subprogram_names(array['p']) as t(v);

This is the result:

 proc_name 
-----------
 p

Finally, list the current role's functions and procedures:

select t.v as subprogram_name
from s0.my_subprogram_names() as t(v);

This is the result:

   subprogram_name   
---------------------
 f
 my_subprogram_names
 p

Returning a set of record values

The syntax here is a trivial extension of the syntax for returning a single record value. Try this:

create function s0.my_subprograms(
  kinds_in text[] = null,
  schema out name, name out name, kind out text)
  returns setof record
  set search_path = pg_catalog, pg_temp
  language sql
as $body$
  with s(schema, name, kind) as (
    select n.nspname, p.proname, p.prokind::text
    from
      pg_proc as p
      inner join
      pg_namespace as n
      on p.pronamespace = n.oid
      inner join pg_roles as r
      on p.proowner = r.oid
    where r.rolname = current_role
    and   p.prokind = any (array['f'::"char", 'p'::"char"]))
  select schema, name, case kind
                          when 'f' then 'function'
                          when 'p' then 'procedure'
                        end
  from s
  where kind = any(
                    case kinds_in
                      when array['f'] then array['f']
                      when array['p'] then array['p']
                      else                       array['f', 'p']
                    end
                  )
  order by schema, name;    
$body$;

Now test s.my_subprograms(). First, list the current role's functions:

select schema, name as func_name
from s0.my_subprograms(array['f']);

This is the result:

 schema |      func_name      
--------+---------------------
 s0     | my_subprogram_names
 s0     | my_subprograms
 s1     | f

Next, list the current role's procedures:

select schema, name as proc_name
from s0.my_subprograms(array['p']);

This is the result:

 schema | proc_name 
--------+-----------
 s2     | p

Finally, list the current role's functions and procedures:

select schema, name as subprogram_name, kind
from s0.my_subprograms();

This is the result:

 schema |   subprogram_name   |   kind    
--------+---------------------+-----------
 s0     | my_subprogram_names | function
 s0     | my_subprograms      | function
 s1     | f                   | function
 s2     | p                   | procedure

Alternative syntax: using "returns table(...)" rather than "returns setof record"

Try this:

create function s0.my_subprograms_2(kinds_in text[] = null)
  returns table(schema name, name name, kind text)
  set search_path = pg_catalog, pg_temp
  language sql
as $body$
  with s(schema, name, kind) as (
    select n.nspname, p.proname, p.prokind::text
    from
      pg_proc as p
      inner join
      pg_namespace as n
      on p.pronamespace = n.oid
      inner join pg_roles as r
      on p.proowner = r.oid
    where r.rolname = current_role
    and   p.prokind = any (array['f'::"char", 'p'::"char"]))
  select schema, name, case kind
                          when 'f' then 'function'
                          when 'p' then 'procedure'
                        end
  from s
  where kind = any(
                    case kinds_in
                      when array['f'] then array['f']
                      when array['p'] then array['p']
                      else                       array['f', 'p']
                    end
                  )
  order by schema, name;    
$body$;

Notice that the defining select statement enquoted by $body$ ... $body$ is identical to the one that implemented the function s0.my_subprograms(). The critical differences are in the function's header:

  • The out formal arguments are simply omitted.
  • The setof record operand of the returns keyword is replaced by table(...).
  • The information about the names and data types of the fields of the returned records is conveyed by the list within the parentheses that follow the keyword table rather than by the list of out formal arguments.

This syntax feels more intuitive than the syntax that returns setof record uses. (It is newer, in the history of PostgreSQL, than the returns setof record syntax.) The function s0.my_subprograms_2() is invoked using the identical syntax that is used to invoke s0.my_subprograms():

select schema, name as func_name
from s0.my_subprograms_2(array['f']);

select schema, name as proc_name
from s0.my_subprograms_2(array['p']);

select schema, name as subprogram_name, kind
from s0.my_subprograms_2();

And the results are identical too, with the obvious trivial difference that my_subprograms_2() itself now shows up in the output. A further benefit of the returns table() syntax is that you use it to return either a set of scalar values or a set of record values.

The result is delivered by the last "select" (or "values") statement

A language sql function is allowed to encapsulate several SQL statements—but there are few cases where this might be useful. Try this:

drop function if exists s0.f();
create function s0.f()
  returns table(k int, v text)
  set search_path = pg_catalog, pg_temp
  language sql
as $body$
  set local search_path = pg_catalog, s0, pg_temp;
  values (1, 'dog'), (2, 'cat'), (3, 'frog');
$body$;

select k, v from s0.f();

It runs without error and produces this result:

 k |  v   
---+------
 1 | dog
 2 | cat
 3 | frog

However, the effect of set local search_path in the function's statement list is the same as making the same setting in the function's header. It's therefore poor practice to write the function as presented here.

Now try this counter-example:

drop function if exists s0.f();
create function s0.f()
  returns table(k int, v text)
  set search_path = pg_catalog, pg_temp
  language sql
as $body$
  values ('apple'), ('orange'), ('pear');
  values (1, 'dog'), (2, 'cat'), (3, 'frog');
$body$;

select k, v from s0.f();

It, too, runs without error and produces the same result as did the first version of s0.f(). The first values statement, though not illegal, has no effect. Notice that its result set has the wrong shape. But this doesn't cause an error because a language sql function's result is delivered by its final statement. This final statement must, therefore, produce results with the shape that the function's header specifies. Deliberately break this rule to see what error this causes:

drop function if exists s0.f();
create function s0.f()
  returns table(k int, v text)
  set search_path = pg_catalog, pg_temp
  language sql
as $body$
  values (1, 'dog'), (2, 'cat'), (3, 'frog');
  values ('apple'), ('orange'), ('pear');
$body$;

It fails at create time thus:

ERROR:  42P13: return type mismatch in function declared to return record
DETAIL: Final statement returns text instead of integer at column 1.

You'd see the same error if you made set local search_path = pg_catalog, s0, pg_temp the final statement.

Annotating the rows produced by a table function using "with ordinality"

The with ordinality clause can be used by any function that returns a set: a built-in function; a language sql user-defined function; or a language plpgsql user-defined function. Try this first:

select s.k, s.v
from generate_series(13, 40, 7) with ordinality as s(v, k);

This is the result:

 k | v  
---+----
 1 | 13
 2 | 20
 3 | 27
 4 | 34

Now try this:

drop function if exists s0.f();
create function s0.f()
  returns table(k int, v text)
  set search_path = pg_catalog, pg_temp
  language sql
as $body$
  with c(k, v) as (
    values (13, 'dog'), (20, 'cat'), (27, 'frog'), (34, 'mouse'))
  select k, v from c order by k;
$body$;

select a.r, a.k, a.v
from s0.f() with ordinality as a(v, k, r);

This is the result:

 r |   k   | v  
---+-------+----
 1 | dog   | 13
 2 | cat   | 20
 3 | frog  | 27
 4 | mouse | 34

The with ordinality clause can have a well-defined meaning only because a set-returning function (a.k.a. table function) is guaranteed to deliver its results in the exact order in which it computes them. This fact is most useful when a language plpgsql table function is used to format a report. Here's a stylized example:

drop function if exists s0.f(numeric, numeric, numeric);
create function s0.f(len in numeric, wid in numeric, hght in numeric)
  returns table(line text)
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
begin
  line := 'Computed Dimensions';                 return next;
  line := '———————————————————';                 return next;
  line := '';                                    return next;
  line := '  length: '||to_char(len,  '99D9');   return next;
  line := '  width:  '||to_char(wid,  '99D9');   return next;
  line := '  height: '||to_char(hght, '99D9');   return next;
end;
$body$;

\t on
select s0.f(5.7, 3.3, 1.9);
\t off

The select statement has no order by clause—and nor is this needed. (Notice that it's legal to elide the select list here.) This is the result:

 Computed Dimensions
 ———————————————————
 
   length:   5.7
   width:    3.3
   height:   1.9

This example gives you a hint about how you might choose between language sql and language plpgsql. It's simple to rewrite it as language sql by using this implementation:

$body$
  values
    ('COMPUTED DIMENSIONS'),
    ('———————————————————'),
    (''),
    ('  length: '||to_char(len,  '99D9')),
    ('  width:  '||to_char(wid,  '99D9')),
    ('  height: '||to_char(hght, '99D9'));
$body$;

In this case, then, its terseness makes language sql the winner. But, in general, a report interleaves headings and other annotations with the results from several set-returning select statements. Depending on the ambition level, the implementation of such a report becomes at best unwieldy or, in the limit, impossible with language sql but straightforward with language plpgsql.

Using "language sql" procedures

The following examples demonstrate that there are very few use cases that can be best implemented using a language sql procedure rather than a language plpgsql procedure.

Procedures cannot have 'out' formal arguments.

Try this:

create schema s;

create table s.t(k serial primary key, v int not null);

create procedure s.p(n inout bigint)
  set search_path = pg_catalog, s, pg_class
  language sql
as $body$
  insert into t(v) select g.v from generate_series(-50, 50, 17) as g(v);
  select count(*) from t;
$body$;

call s.p(null::bigint);

This runs without error and produces the expected result:

 n 
---
 6

Now rewrite it thus:

drop procedure s.p(int) cascade;

create procedure s.p(n out int)
  set search_path = pg_catalog, s, pg_class
  language sql
as $body$
  insert into t(v) select g.v from generate_series(-50, 50, 17) as g(v);
  select count(*)::int from t;
$body$;

The create fails with the 0A000 error:

procedures cannot have OUT arguments

and the hint says INOUT arguments are permitted.

This restriction is inherited from PostgreSQL Version 11. Current PostgreSQL does not suffer from this limitation. Therefore, it will be lifted in a later version of YugabyteDB that uses a PostgreSQL version where the limitation has been lifted. GitHub Issue #12348 tracks this.

Deleting from a table and reporting the outcome

You might find this example convincing. It certainly runs without error and produces the expected outcome:

create schema s;

create table s.t(k serial primary key, v int not null);
insert into s.t(v) select g.v from generate_series(17, 50, 11) as g(v);

create procedure s.p(lb in int, ub in int, outcome inout int[])
  set search_path = pg_catalog, s, pg_temp
  security definer
  language sql
as $body$
  with c(v) as (delete from s.t where v between lb and ub returning v)
  select array_agg(v) from c;
$body$;

call s.p(20, 45, null);

This is the result:

 outcome 
---------
 {28,39}

Inserting, updating, or deleting in two or several tables in a single transaction

This example, too, runs without error and produces the expected outcome:

create table s.t1(k serial primary key, v int not null);
create table s.t2(k serial primary key, v int not null);

create procedure s.p1(t1_v in int, t2_v in int)
  set search_path = pg_catalog, s, pg_temp
  security definer
  language sql
as $body$
  insert into t1(v) values(t1_v);
  insert into t2(v) values(t2_v);
$body$;

call s.p1(17, 42);

select 't1' as "table", k, v from s.t1
union all
select 't2' as "table", k, v from s.t2;

Here is the result of the final status query:

 table | k | v  
-------+---+----
 t1    | 1 | 17
 t2    | 1 | 42

However, it's very rare that making two or several mutually independent inserts into different tables in a single transaction meets a useful business purpose. Realistic examples of inserting into two tables, or reading from one table and inserting into another table, occur, for example, when one table has a foreign key constraint to the other, when each table has an autogenerated surrogate primary key, and when the masters table has a unique business key.

  • You want to insert a new master row and a few details rows for it. You need first to insert the master row and discover what primary key value it was given. Then you insert the details rows setting their foreign key column values to the value that you discovered for the new master row's primary key.
  • You want to insert, update, or delete details rows for an existing master row identified by its unique business key. You need, therefore, first to query the masters table to discover the primary key value and then use that to do the intended inserts, updates, or deletes to the details table.

This is straightforward if you use language plpgsql procedures. Try this:

create table s.masters(
  mk serial primary key,
  mv text not null unique);

create table s.details(
  dk serial primary key,
  mk int not null references s.masters(mk),
  dv text not null);

create procedure s.insert_new_master_and_details(new_mv in text, dvs in text[])
  set search_path = pg_catalog, s, pg_temp
  language plpgsql
as $body$
declare
  new_mk int not null := 0;
begin
  insert into s.masters(mv) values(new_mv) returning mk into new_mk;

  insert into s.details(mk, dv)
  select new_mk, u.v
  from unnest(dvs) as u(v);
end;
$body$;

call s.insert_new_master_and_details('mary', array['skiing', 'running', 'cycling']);

create procedure s.add_details_to_existing_master(old_mv in text, dvs in text[])
  set search_path = pg_catalog, s, pg_temp
  language plpgsql
as $body$
declare
  old_mk int not null := 0;
begin
  select mk into old_mk
  from masters
  where mv = old_mv;

  insert into details(mk, dv)
  select old_mk, u.v
  from unnest(dvs) as u(v);
end;
$body$;

call s.add_details_to_existing_master('mary', array['tennis', 'swimming']);

select mv, dv
from s.masters inner join s.details using(mk)
order by mk, dk;

This is the result of the final query, just as is expected:

  mv  |    dv    
------+----------
 mary | skiing
 mary | running
 mary | cycling
 mary | tennis
 mary | swimming

Each procedure depends, critically, on holding a value that the first SQL statement returns in a local variable so that the second SQL statement can use it. But a language sql subprogram as no notion that corresponds to a local variable to hold state for the duration of the subprogram call. A temporary table (created using on commit delete rows seems at first to be promising—except, of course, that it would distribute the functionality across too many moving parts. (See, too, the section The execution model for 'language sql' subprograms at the start of this page.) However, it turns out anyway to be impractical for another reason. This construct produces the required value:

with c(v) as (
  insert into s.masters(mv) values('dick') returning mk)
select v from c;

And this analogous construct to consume it works fine:

create table pg_temp.t(v int);
insert into pg_temp.t(v)
with c(v) as (
  values(17))
select v from c;

But combining the two ideas fails:

insert into pg_temp.t(v)
with c(v) as (
  insert into s.masters(mv) values('dick') returning mk)
select v from c;

It causes the 0A000 error:

WITH clause containing a data-modifying statement must be at the top level

This example demonstrates vividly that a language sql procedure has severely limited capability. You might like, therefore, to adopt a simple practice rule: always implement user-defined procedures using language plpgsql.

Executing a 'language sql' subprogram versus executing a 'prepared' SQL statement

Each of these approaches lets you set up a parameterized SQL statement for reuse so that each successive invocation can use different actual arguments. Connect as an ordinary user that has all privileges on a sandbox database and do this set-up:

create schema s;
create table s.t(k serial primary key, v int not null);

prepare stmt_1(int, int) as
insert into s.t(v) select g.v*2 from generate_series($1, $2) as g(v);

create procedure s.p(lb in int, ub in int)
  set search_path = pg_catalog, s, pg_temp
  security definer
  language sql
as $body$
  insert into t(v) select g.v*3 from generate_series(lb, ub) as g(v);
$body$;

prepare stmt_2(int[]) as
select k, v from s.t where k = any($1) order by k;

create function s.f(in_list in int[])
  returns table(k int, v int)
  set search_path = pg_catalog, s, pg_temp
  security definer
  language sql
as $body$
  select k, v from t where k = any(in_list) order by k;
$body$;

Now populate the table:

execute stmt_1(1, 5);
call s.p(lb=>6, ub=>10);

Query it first using the prepared statement:

execute stmt_2(array[2, 4, 6]);

This is the result:

 k | v  
---+----
 2 |  4
 4 |  8
 6 | 18

And now query it using the function:

select k, v from s.f(in_list=>array[3, 5, 7]);

This is the result:

 k | v  
---+----
 3 |  6
 5 | 10
 7 | 21

The two approaches are very similar in what they achieve:

  • Each allows the use of "placeholders" rather than explicit values at definition time.
  • Each allows explicit values to be bound to the placeholders at execution time.
  • Each allows the work of the syntactic and semantic analysis to be done just once at definition time and then re-used many times at execution time.
  • Each allows the reuse of the same execution plan (and the corresponding performance benefit by avoiding recomputing the plan) when sufficient re-executions are detected as using the same plan.

The prepare-execute approach uniquely supports a useful explain thus:

explain execute stmt_2(array[2, 4, 6]);

It produces this result:

 Sort  (cost=4.12..4.13 rows=1 width=8)
   Sort Key: k
   ->  Index Scan using t_pkey on t  (cost=0.00..4.11 rows=1 width=8)
         Index Cond: (k = ANY ('{2,4,6}'::integer[]))

But the explain for the create-function-select-from-function approach provides no useful information:

explain select k, v from s.f(in_list=>array[3, 5, 7]);

It produces this result:

 Function Scan on f  (cost=0.25..10.25 rows=1000 width=8)

In all other ways, the create-subprogram-select-from-function approach is better than the prepare-execute approach.

  • The placeholders in a prepared statement can take only the non-mnemonic form $1, $2,... but the placeholders in a language sql function can use identifiers to denote usefully mnemonic names.
  • You don't need to arrange for session-pool initialization code to do the prepare step whenever a new connection is added to the pool. Rather, the moral equivalent of that step happens implicitly in any newly-started session when call the_procedure() or select... from the_function() is first executed—so no dedicated initialization code is needed.
  • Because a function (unless you choose to create it as a temporary object) is a persistent schema-object, it's available for use by any role to which usage on the schema and execute on the function are granted. In contrast, only the role that executes the prepare statement can execute the statement that it prepared—and only for the lifetime of the session.
  • You can use the security definer mode to hide the schema-objects that the procedure(s) and functions(s) manipulate from the code that connects as the client role.
  • You can invoke a function in any expression where the value is needed.
  • You can add "set search_path = pg_catalog, pg_temp" as a subprogram attribute—in other words, you can enforce the use of qualified identifiers and ensure that the objects that a procedure or function relies upon cannot be captured by a malicious client that manages to create temporary objects and manipulate the value of the search_path at the session level and thereby capture the application's schema-objects. (See the section Demonstrating how a temporary table, created on-the-fly, can capture an application's intended functionality.)

A table function always immediately produces its entire result set

This fact is of no consequence when the table function is used "bare" like in the example that writes the "Computed Dimensions" report in the section Annotating the rows produced by a table function using "with ordinality". But it can have a noticeable negative performance effect if you use a table function in a surrounding SQL statement that imposes a restriction (or even a projection) or especially that uses limit N. In other words, there is no scheme analogous to the "pipelined" feature for table functions in Oracle Database; nor is there a mechanism for pushing down predicates into the function.

Selecting from a 'language sql' subprogram versus selecting from a view

Consider a security definer language sql subprogram with no formal parameters whose returns argument is setof record. It's tempting to see it as the functional equivalent of a view. (A view non-negotiably behaves as if it were defined with security definer.) Each is a schema-object and, provided that appropriate privileges are granted, can be used by any role.

There are, however, critical mutually exclusive differences. A view wins in these ways:

  • When a view is used in a select statement that itself expresses a restriction, the combined effect of the using statement and the view's defining select statement can be flattened into a single statement before the execution plan is determined. This cannot happen with a function.
  • When a view's definition meets certain criteria, the view can be the target of an update or delete statement. This cannot happen with a function.

And a function wins in this way:

  • A function can, of course, be parameterized. But this cannot be done with a view.