PL/pgSQL's "create" time and execution model

What does creating or replacing a subprogram do?

The SQL statements create [or replace] function and create [or replace] procedure do a full syntax check, but just a partial semantics check. Try the following examples. Connect as an ordinarily privileged user (say, d0$u1) to a database (say, d0) where it can create objects. Prepare for the tests by creating a function to show the PL/pgSQL source code that defines a subprogram owner by the current_role, in schema s, and with the specified name:

\c d0 d0$u0
create schema s;

create function s.prosrc(f in text)
  returns text
  language sql
  set search_path = pg_catalog, pg_temp
as $body$
  select prosrc
  from
    pg_proc p
    inner join
    pg_namespace n
    on p.pronamespace = n.oid
    inner join
    pg_roles r
    on p.proowner = r.oid
  where p.proname = f::name
  and   n.nspname = 's'::name
  and   r.rolname = current_role;
$body$;

Now create a function whose source text has an obvious semantic error:

create function s.f1()
  returns text
  security definer
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  a text not null := (select t.v from s.t where t.k = 1);
begin
  return a;
end;
$body$;

The table s.t doesn't exist. But nevertheless the create function succeeds. Confirm that its PL/pgSQL source code is recorded in pg_proc thus:

select s.prosrc('f1');

This is the result:

                                                          +
 declare                                                  +
   a text not null := (select t.v from s.t where t.k = 1);+
 begin                                                    +
   return a;                                              +
 end;                                                     +

Now attempt to execute s.f1(). It's only now that the semantic error shows up, as error 42P01, thus:

relation "s.t" does not exist

In a test where the code allows many different code paths according to, say, the values of the actual arguments, an error like this might remain undetected over several different invocations of the subprogram only to show up when a path is taken that reaches the statement with the error.

Create and populate the missing table and try again:

create table s.t(k serial primary key, v text);
insert into s.t(v) values ('cat');
select s.f1();

This is the result:

 f1  
-----
 cat

Drop the table s.t and see if the function s.f1() remains intact:

drop table s.t cascade;
select s.prosrc('f1');

It does. The keyword cascade doesn't have the effect that you might hope for because no metadata is recorded to say that the function s.f1() depends upon the table s.t.

Next, create a second function whose source text, again, has a semantic error:

create function s.f2()
  returns text
  security definer
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  a s.t.v%type not null := 'dog';
begin
  return a;
end;
$body$;

Here, the 42601 error is reported:

invalid type name "s.t.v%type"

(Notice that the error code, in fact, denotes a syntax error.) But, in contrast with the behavior with the function s.f1(), the error occurs already on the attempt to create the function rather than not until the attempt to execute it. This query shows no record of s.f2() in pg_proc:

select s.prosrc('f2');

If create [or replace] fails, then the statement has no effect. In particular, the content of pg_proc remains unchanged. Only an error-free create [or replace] will store the subprogram's definition in the catalog as its source text and other attributes spread among suitable dedicated columns in pg_proc.

Don't try to understand what errors are detected already at 'create' time and what errors are detected only at runtime.

You must accept that an error-free create [or replace] doesn't guarantee the correctness of the function or procedure. This is hardly a new realization. As ever, you must design your tests so that they exercise every possible code path.

What is recorded in the catalog following an error-free "create [or replace]"?

Do this:

create schema s;
create table s.t(k int primary key, c1 boolean, c2 numeric, dummy int);
insert into s.t(k, c1, c2) values (1, true, 17.1), (2, false, 42.3), (3, true, 59.7);

create function S.F(K_IN S.T.K%TYPE, dummy in s.t.dummy%type = 0)
  returns table(z text)
  security definer
  set timezone = 'UTC'
  language plpgsql
as $body$
declare
  v_c1 s.t.c1%type;
  c2_out s.t.c2%type;
begin
  for v_c1, c2_out in (select c1, c2 from s.t where k > k_in order by k) loop
    z := rpad(v_c1::text, 7)||c2_out::text;
    return next;
  end loop;
end;
$body$
  stable
  set search_path = pg_catalog, pg_temp;

The arbitrary (and unconventional) use of upper and lower case, and the whimsical placing of the language and security attributes and one of the two set run_time_parameter attributes before the PL/pgSQL source text and the volatility attribute and another set run_time_parameter attribute after the source text is for pedagogic effect. Now use the "\sf s.f" meta-command to see what was recorded in the catalog. This is the output:

CREATE OR REPLACE FUNCTION s.f(k_in integer, dummy integer DEFAULT 0)
 RETURNS TABLE(z text)
 LANGUAGE plpgsql
 STABLE SECURITY DEFINER
 SET "TimeZone" TO 'UTC'
 SET search_path TO 'pg_catalog', 'pg_temp'
AS $function$
declare
  v_c1 s.t.c1%type;
  c2_out s.t.c2%type;
begin
  for v_c1, c2_out in (select c1, c2 from s.t where k > k_in order by k) loop
    z := rpad(v_c1::text, 7)||c2_out::text;
    return next;
  end loop;
end;
$function$

This is most certainly not the verbatim form of the create function statement that was submitted. While the PL/pgSQL source text within the dollar quotes (stored in pg_proc.prosrc) has been preserved verbatim, there are many other differences:

The bare create as entered has been rendered by \sf as create or replace.
The spelling of the dollar quoting has been changed from $body$ to $function$.
The case of everything else has been canonicalized so that SQL keywords are rendered in upper case and all other text is rendered in lower case. (The names of schema-objects, formal arguments, and the like are actually rendered with proper respect to their actual case. But in this example, only non-quoted identifiers were used—meaning that the denoted names are all lower case. See the section Names and identifiers.)
The syntax for specifying the default expression for the dummy formal argument uses the keyword default but the text that followed the user's create used specified the default expression by using the alternative = syntax.
Double and single quotes have been added in the set run_time_parameter attributes.
All the attributes have been placed before the PL/pgSQL source text, and the mutual ordering of these attributes has been changed.
The %type data type designations for the formal arguments have been translated to what they happen to mean at the moment that the DDL was issued.

This shows that everything but the PL/pgSQL source text undergoes full analysis at create time. The parsed out information is individually stored in dedicated pg_proc columns so that the original upper/lower case usage and ordering has been lost. For example, the name, the mode, and the data type of each formal argument are represented in the three parallel arrays, proargnames text[], proargmodes "char"[], and proallargtypes oid[]. Therefore, because the user-entered text that follows create [or replace] is not recorded verbatim in the catalog, it's impossible to re-create that text. Rather, \sf generates a canonically represented create or replace statement that has the same effect as what the user entered. Of course, then, the function works as expected. Test it thus:

select z from s.f(1);

This is the result, as expected:

      z      
-------------
 false  42.3
 true   59.7

The last bullet point, about the create-time translation of the %type designations has significant consequences. Try this.

alter table s.t add column c2_new text;
alter table s.t add column dummy_new text;
update s.t set c2_new = c2::text;
alter table s.t rename column c2 to c2_old;
alter table s.t rename column dummy to dummy_old;
alter table s.t rename column c2_new to c2;
alter table s.t rename column dummy_new to dummy;
alter table s.t drop column c2_old;
alter table s.t drop column dummy_old;

YSQL doesn't support 'alter table... alter column... set data type'.

The effect of the eight alter table statements together with the one update statement can be achieved with fewer statements in PostgreSQL by using alter table... set data type.

These DDL statements, in PostgreSQL, have almost the same effect:

alter table s.t alter column c2 set data type text;
alter table s.t alter column dummy set data type text;

Notice that because alter table... alter column... set data type leaves the data in place in the column whose data type is altered, this implies implicit data conversion using the typecast from the pre-alter data type to the post-alter data type. The DDL has no option to allow you to specify your own conversion method. In contrast, the method that you must use in YSQL necessitates an explicit DML to copy the data from the pre-alter data type in the starting column to the post-alter data type in the replacement column. And this would allow you, if it suited your purpose, to use your own method to convert the data values. (For example, you might want to use to_char and specify a fixed number of decimal digits.) In other words, YSQL forces you to use a more general approach.

Now repeat "\sf s.f". The output starts with this:

CREATE OR REPLACE FUNCTION s.f(k_in integer, dummy integer DEFAULT 0)

You can see that the user's intention, that the data type of the dummy formal argument should be whatever the column s.t.dummy has, is no longer honored. It's clear how this happened: it flows from the PostgreSQL design concept, inherited by YSQL, that parses out the meaning of all the user text that follows create [or replace], apart from the PL/pgSQL source text, at the time that the DDL is issued and stores this as atomic facts in pg_proc. The net effect is that using %type to define the data type of a subprogram's formal argument has only documentation value.

Notice that %type works differently in PL/pgSQL source text. See the section How PL/pgSQL's "create" time and execution model informs the approach to patching a database application's artifacts.

You must re-create a subroutine with a formal argument that uses %type when the data type of the column to which this refers is changed.

You may decide to avoid using %type for formal arguments and to specify the data type explicitly. However, whichever choice you make, you face the same maintenance challenge if the data type of the column that you want the formal argument to match is changed. You must maintain your own dependency documentation that will inform you when subprogram(s) need changes to accommodate data type changes in table columns. And, when called for, you must explicitly create or replace the affected subprograms. If you use %type, then you won't need to change the spelling of the create or replace statement; and if you don't use %type, then you will need to change the spelling of the create or replace statement. And this is the extent of the benefit that using %type for a subprogram's formal argument brings.

See the section Specifying a column's data type by using a domain. The approach it describes greatly reduces the likelihood that you will fail to notice where you need to re-create subprograms in response to changes in the structure of tables.

Execution model

This wording is taken (but re-written slightly) from the section PL/pgSQL under the Hood in the PostgreSQL documentation:

The first time that a subprogram is called within each session, the PL/pgSQL interpreter fetches the subprogram's definition from pg_proc, parses the source text, and produces an abstract syntax tree (a.k.a. AST). The AST fully translates the PL/pgSQL statement structure and control flow, but individual expressions and complete SQL statements used in the subprogram are not translated immediately. (Every PL/pgSQL expression is evaluated, at runtime, as a SQL expression.)

As each expression and SQL statement is first executed in the subprogram, the PL/pgSQL interpreter parses and analyzes it to create a prepared statement.

Subsequent visits to that expression or SQL statement reuse the prepared statement. Thus, a subprogram with conditional code paths that are seldom visited will never incur the overhead of analyzing what is never executed within the current session. On the other hand, errors in a specific expression or SQL statement cannot be detected until that part of the subprogram is reached in execution. This brings the outcome that errors that, with a different model, would be detected at create time can (without proper testing) remain undetected until the subprogram has been in production use for some time.

Successive subsequent executions of a particular subprogram in a particular session will, in general, prepare more and more of its SQL statements and expressions as each new execution takes a different control-flow path.

This model brings huge understandability and usability benefits to the application developer:

The identical set of data types, with identical semantics, is available in both top-level SQL and in PL/pgSQL.
Expression syntax and semantics are identical in both top-level SQL and in PL/pgSQL.

But the model also brings the application developer some drawbacks:

Errors in a specific expression or SQL statement cannot be detected until runtime, and then not until (or unless) it is reached. Such an encounter depends on a current execution's control flow. And this is determined by run-time values like the actual arguments with which a subprogram is invoked or the results of executing SQL statements. Some particular programmer error might therefore remain undetected for a long time, even after a subprogram is deployed into the production system.

Simply mis-spelling the name of a variable, at just one location in a subprogram's source code, can cause this kind of delayed error.
You must track functional dependencies like subprogram-upon-table, subprogram-upon-subprogram, and so on manually in your own external documentation. And you must drive your plan for making the cascade of accommodating changes to the closure of functionally dependent subprograms, when an object is changed, entirely from this documentation.

How PL/pgSQL's "create" time and execution model informs the approach to patching a database application's artifacts.

Because neither PostgreSQL nor, correspondingly, YSQL tracks the dependencies that a PL/pgSQL subprogram has on other database objects that it uses, the representation of a subprogram that's already been executed in a particular session will, in general, remain in place and unchanged even when any of its dependency parents are changed.

Notice that various ad hoc mechanisms are in place. For example, empirical tests show that when a variable is declared by defining its data type using some_relation%rowtype, a subprogram's in-memory representation will track changes to the referenced table or view—both in PostgreSQL and in YSQL.

The test shown here uses a variable whose data type is defined using some_relation.some_column%type. Try it first using PostgreSQL. It behaves the same in any of Version 11 through the current version. Connect as an ordinary role to a suitable sandbox database where the role has the create privilege on the database and do this:

create schema s;
create table s.t(k serial primary key, v integer);
insert into s.t(k, v) values (1, 42);

create function s.f(k_in in s.t.v%type)
  returns text
  security definer
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  v_type text      not null := '';
  v_out s.t.v%type;
begin
  select pg_typeof(t.v)::text, t.v
  into strict v_type, v_out
  from s.t
  where t.k = k_in;
  return 'pg_typeof(t.v): '  ||v_type                 ||' / '||
         'pg_typeof(v_out): '||pg_typeof(v_out)::text ||' / '||
         'value: '           ||v_out;
end;
$body$;

select s.f(1);

This is the result:

 pg_typeof(t.v): integer / pg_typeof(v_out): integer / value: 42

Imagine that the requirements change so that the table s.t must now store text values rather than just integer values. This can be accommodated, because integer values can be typecast to _text values, thus:

alter table s.t add column v_new varchar(10);
update s.t set v_new = v::text;
alter table s.t rename column v to v_old;
alter table s.t rename column v_new to v;
alter table s.t drop column v_old;

-- Exercise the new functionality
insert into s.t(k, v) values (2, 'dog');

In general, you'd expect that any subprogram that accesses the table s.t would need code changes to reflect the change to the table. But in special cases, like this contrived example shows, no code changes are needed. You might think that it's sufficient just to quiesce all client-side activity while the table change is ongoing and then allow activity to start again without restarting each of the client sessions. But this is in general unsafe. You can demonstrate this by using PostgreSQL (but not, as it happens, by using YSQL) thus:

select s.f(1);

This is the result:

pg_typeof(t.v): character varying / pg_typeof(v_out): integer / value: 42

The reported data type of the table column, t.v, does reflect the table change that was made because it's reported by an ordinary SQL statement—and the representation of a prepared SQL statement does maintain dependency information so that it will be invalidated and recalculated in the present scenario. But the reported data type of the PL/pgSQL variable v_out is unchanged from the pre-patch result. This tells you that the data type for the local variable was determined when the function s.f() was executed for the first time in the session; and that the result of that determination has survived the change to the table's replacement column, still called v, but now with data type text and not the original integer. The unchanged in-memory representation of the function s.f() means that this test is bound to cause a runtime error when the function selects a text value:

select s.f(2);

It causes the 22P02 error, thus:

invalid input syntax for integer: "dog"

You can repeat "select s.f(2)" as many times as you like. It always causes the same error. In other words, the failure doesn't cause the in-memory representation of the function s.f() to be marked so that the next execution will cause it to be recomputed. Rather, you must force this by exiting the session and starting a new one. Only then, do you see the intended result for the row. This is what executing s.f(), for the two rows of interest, now shows:

 pg_typeof(t.v): character varying / pg_typeof(v_out): character varying / value: 42
 pg_typeof(t.v): character varying / pg_typeof(v_out): character varying / value: dog

Now try the same test using YSQL. Its internal implementation, for this particular scenario, does notice the change to the table s.t and marks the in-memory representation of the function s.f() so that it will be re-calculated on the immediate next execution attempt. In other words you don't see wrong results or runtime errors.

Neither PostgreSQL nor YSQL promises a reliable response to changes in a subprogram's dependency parents.

Sometimes, you do see a correct automatic response to a change in a dependency parent. And sometimes you do not see this. Neither RDBMS documents when such responses happen and when they do not. And nor will this stance change. The developers of RDBMSs always retain the right to leave certain observable aspects of behavior undocumented so that they can, when there are good reasons for this, make changes to the internal implementation that bring changes in such undocumented aspects of behavior as unintended side-effects.

This realization leads to a critical practice recommendation.

Terminate all database clients before making any changes among the database artifacts that implement an application's backend.

In the general case, no patching exercise among a database application's artifacts is done by changing just a single one of these. As mentioned, it's the development shop's responsibility to determine the set of artifacts that must be changed—without the help of a dependency graph that the RDBMS maintains. (This thinking extends to the determination of the set of client-side code changes that are needed.) Of course, from the moment that the first change is made through the moment when all needed changes have been made, the application as a whole cannot be in use because its components are mutually inconsistent.

Terminating all database clients before starting the changes, and re-starting them only when all have been made, guarantees that the in-memory representations of PL/pgSQL subprograms will be current when the application is opened up again for normal use.

This strict approach guarantees correctness. But no attempt that aims to work out, by human analysis, when normal database client sessions might be left connected during patching can do this. It's very unlikely that following the approach that this tip recommends will measurably affect the overall duration of the application's unavailability window that the patching exercise brings.

Specifying a column's data type by using a domain

A current YSQL limitation defeats the approach that this section describes.

These DDL statements in the section What does the catalog record following an error-free "create [or replace]" execute without error in YSQL:

alter table s.t add column c2_new text;
alter table s.t add column dummy_new text;

But this DDL statement, critical for the approach that this section describes, currently fails:

alter table s.t add column v_new s.num_new;

It causes this error:

ERROR:  0A000: Rewriting of YB table is not yet implemented

The only difference is that the data type of the to-be-added column is given by a user-defined domain rather than by a base type.

The hint refers you to GitHub issue #13278 and tells you to react with thumbs up to raise its priority.

Notice that this error occurs only if the domain definition for the to-be-added column specifies a constraint. Notice, too, that when %type is used in a declaration, only the data type is inherited and constraints that might have been defined on the column to which %type refers are ignored. It's therefore still better to use a domain where you might have thought of using %type.

You can use %type and %rowtype for specifying the data type of a subprogram's formal argument or return value for both language plpgsql subprograms and language sql subprograms. And you can use the same syntax for specifying the data type of a local variable in PL/pgSQL source text. The aim of the syntax is, of course, to give you a single point of definition fr the data type that can be referred to from many sites where it's needed. However, there are many use cases where these referring sites occur for the columns in two or several tables. For example:

The data type(s) of the column(s) on which a foreign key constraint is defined must match the data type(s) of the column(s) on which the primary key constraint is defined in the referenced table.
A numeric quantity like the route distance between two points on the Earth's surface will ideally be defined in the same way (for example, must be expressed in kilometers with three decimal digits of precision, must not be negative, must not be greater than, say, the Earth's circumference) and might appear in several different tables for different kinds of routes.

You can't use %type or %rowtype for specifying the data type of a table's column. But a domain meets exactly this use case. And it does so better than %type and %rowtype can do when the referring site is in a subprogram because a domain can represent constraints as well as designating the data type.

Try this example in PostgreSQL using Version 11 or any later version:

create schema s;
-- Deliberate poor definition of domain "s.num" (upper bound is too small).
create domain s.num as numeric constraint num_ok check(value > 0.0 and value <= 10.0);

create table s.t(k integer primary key, v s.num);
insert into s.t(k, v) values (1, 5);

Now create the file "cr-function.sql" thus:

create function s.f(v_in in s.num)
  returns text
  security definer
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
declare 
  r text not null := '';
begin
  select k::text into strict r from s.t where v = v_in;
  return r;
end;
$body$;

and continue thus:

\ir cr-function.sql
select s.f(5.0);

-- Improved definition of domain "s.num".
create domain s.num_new as numeric constraint num_ok check(value > 0.0 and value <= 20.0);

-- Causes error 0A000 in YSQL. OK in PostgreSQL.
alter table s.t add column v_new s.num_new;

update s.t set v_new = v::s.num_new;
alter table s.t rename column v to v_old;
alter table s.t rename column v_new to v;
alter table s.t drop column v_old;

-- Without "cascade", this causes error 2BP01:
-- cannot drop type s.num because other objects depend on it.
drop domain s.num cascade; --> drop cascades to function s.f(s.num)

alter domain s.num_new rename to num;
insert into s.t(k, v) values (2, 14.5);

\ir cr-function.sql
select s.f(5.0);

If you issue set client_min_messages = notice before doing this, then you'll be notified that drop domain... cascade also drops the function s.f(). This means that you're forced to re-create it. But you don't have to change the text of the create function statement. This guarantees that all references to the replaced domain s.num, will now use the new definition so that the single point of definition is properly honored.