Expression-based index referencing immutable function

Try this simple example to shows how the dishonesty might catch out out. First, the basic setup is straightforward:

deallocate all;
drop table if exists t cascade;
drop function if exists f2(int) cascade;
drop function if exists f1(int) cascade;

create function f1(i in int)
  returns int
  immutable
  language plpgsql
as $body$
begin
  return i*2;
end;
$body$;

create table t(k int primary key, v int);
insert into t(k, v)
select g.v, g.v*10
from generate_series(1, 1000) as g(v);

create index i on t(f1(v));

set pg_hint_plan.enable_hint = on;
prepare qry as
select /*+ IndexScan(t) */
  k, v
from t where f1(v) = 3000;

execute qry;
explain execute qry;

The query produces this result:

  k  |  v
-----+------
 150 | 1500

And the explain produces output:

 Index Scan using i on t  (cost=0.00..7.72 rows=10 width=8)
   Index Cond: (f1(v) = 3000)

Suppose that you realize that you designed the function body wrong and that you now want to implement it differently. You might naïvely (but wrongly as you'll soon see) do this:

create or replace function f1(i in int)
  returns int
  immutable
  language plpgsql
as $body$
begin
  return i*3;
end;
$body$;

execute qry;
explain execute qry;

The query result is unchanged—in other words, you have a wrong result. (And still, as you wanted, the index is used to produce the result.) The problem is that you used create or replace, and this is designed to avoid cascade-dropping dependent objects. The index i is just such a dependent object—and so it remained in place. But is was built using the old definition of the function's behavior. The index is, of course, an explicitly requested cache of the function's results—and so the only choice that you have is to drop it and recreate it when the newly-defined function is in place. You should, therefore, use drop and then a fresh bare create in this scenario. Try this first, to dramatize the outcome:

drop function f1(int);

It cause this error:

2BP01: cannot drop function f1(integer) because other objects depend on it

This, therefore, is the proper approach:

drop function f1(int) cascade;
create function f1(i in int)
  returns int
  immutable
  language plpgsql
as $body$
begin
  return i*3;
end;
$body$;

-- Now re-create the index.
create index i on t(f1(v));

Now do this again:

execute qry;
explain execute qry;

Now you get the correct result:

  k  |  v
-----+------
 100 | 1000

and the plan remains unchanged:

 Index Scan using i on t  (cost=0.00..7.72 rows=10 width=8)
   Index Cond: (f1(v) = 3000)

Always use 'drop' followed by a fresh bare 'create' to change an existing 'immutable' function.

The only safe way to change the definition of a function that's referenced in the definition of an expression-based index is to drop it and then to use a fresh bare create to make it what you want. The same reasoning holds when an immutable function defines a constraint. Yugabyte recommends that, in other cases where you want to change the definition of an immutable function, rather than try to reason about when create or replace might be safe, you instead simply always follow this practice.

Constraint referencing immutable function

Here's a simple example:

drop table if exists t cascade;
drop function if exists t_v_ok(text) cascade;

create table t(k serial primary key, v text not null);

create function t_v_ok(v in text)
  returns boolean
  immutable
  language plpgsql
as $body$
begin
  if not (v ~ '^[a-z]{5}$') then
    raise exception using
      errcode = '23514'::text,
      message = 'Bad "'||v||'" — Must be exactly five lower-case [a-z] characters';
  end if;
  return true;
end;
$body$;

alter table t add constraint t_v_ok
  check(t_v_ok(v));

In this case, the constraint could have been defined with far less code, this:

create table t(
  k serial primary key,
  v text not null constraint t_v_ok check(v ~ '^[a-z]{5}$'));

But sometimes it's useful, for example, to encapsulate several constraints whose expressions are fairly elaborate in the body of a single function. The function encapsulation also lets you raise an exception, on violation, using specific wording to express the problem. Suppose that you attempt this:

insert into t(k, v) values(2, 'rabbit');

when the constraint's expression is written explicitly in the table's definition, you get this error:

new row for relation "t" violates check constraint "t_v_ok"

But with the function encapsulation, as shown here, you get this error:

Bad "rabbit" — Must be exactly five lower-case [a-z] characters

Surprisingly, and in contrast to the example of the expression-based index that references a user-defined function, you are not required to mark the function that defines a constraint as immutable. But clearly a function that's used this way must be consistent with the rules that allow it to be honestly so marked.

Marking a function that's used to define a constraint immutable reminds you to follow the same practice, should you need to change its definition, as for the expression-based index case: Always use 'drop' followed by a fresh bare 'create'. Create or replace allows you to change the meaning of the constraint-defining function without dropping and re-creating the constraint. And this would leave you with table data in place that violated the new definition of the constraint.

Immutable function that is dishonestly so marked

This example shows that wrong-results brought by the session-duration caching of a function that is dishonestly marked immutable. First, do the set-up. Notice that the result that dishonestly_marked_immutable() returns is affected by the value of the user-defined session parameter "x.a" and so, clearly, successive invocations with the same actual argument might not return the same value. The appropriate marking for this function (as was demonstrated above) is stable.

deallocate all;
drop function if exists dishonestly_marked_immutable(int) cascade;

set x.a = '13';

create function dishonestly_marked_immutable(i in int)
  returns int
  immutable
  language plpgsql
as $body$
begin
  return i*(current_setting('x.a')::int);
end;
$body$;

prepare q as
select
  dishonestly_marked_immutable(2) as "With actual '2'",
  dishonestly_marked_immutable(3) as "With actual '3'";

execute q;

This is the result:

 With actual '2' | With actual '3'
-----------------+-----------------
              26 |              39

Now contrive a wrong result by changing the value of "x.a":

set x.a = '19';
execute q;

The result is unchanged—and so it's now wrong with respect the function's intended behavior. The reason is that the results for the actual arguments "2" and "3" have been cached in the execution plan that hangs off the prepared statement "q". You can demonstrate this by using discard plans, thus:

discard plans;
execute q;

This is the new result:

 With actual '2' | With actual '3'
-----------------+-----------------
              38 |              57

Don't consider this to be an acceptable workaround. The problem is that the function has been marked with the wrong volatility—and so the only way to fix this is to drop is and to re-create it with the correct volatility, stable.

Using DISCARD PLANS, to clear the 'immutable' cache, is generally unsafe.

The present example showed a deliberate error for the sake of pedagogy. And the pedagogy was served simply by clearing the stale cached results in the present session—and only here. But there will be ordinary situations where you don't want cached results for an immutable function to persist after a change that makes these no longer correct.

discard plans affects only the session that issues it. But many concurrent sessions each could have cached such to-be-purged stale results. To do this safely in a way that handles all situations, cluster-wide, you must drop and re-create the immutable function in question. See the tip Always use 'drop' followed by a fresh bare 'create' above.

Immutable function that depends, functionally, on another immutable function

Begin with this setup:

deallocate all;
drop function if exists f1(int) cascade;
drop function if exists f2(int) cascade;

create function f1(i in int)
  returns int
  immutable
  language plpgsql
as $body$
begin
  return i*2;
end;
$body$;

create function f2(i in int)
  returns int
  immutable
  language plpgsql
as $body$
begin
  return f1(i);
end;
$body$;

prepare q as
select f2(13);

execute q;

This is the result:

 f2
----
 26

This might seem to be a proper use of the immutable marking—not just for f1(); but also for f2(), because f2() does nothing but call the immutable f1(). However, you must be very careful if you implement this pattern—as the following shows.

Now drop and re-create f1() and then re-execute "q":

drop function f1(int) cascade;

create function f1(i in int)
  returns int
  immutable
  language plpgsql
as $body$
begin
  return i*3;
end;
$body$;

execute q;

You might have expected to see "39"—but the result is unchanged! The reason is that PostgreSQL (and therefore YSQL) do not track what the human sees as the dependency of f2() upon f1(). This outcome is a consequence of how PL/pgSQL source text is interpreted and executed. This is explained in the section PL/pgSQL's execution model.

You must therefore track functional dependencies like f2() upon f1() manually in external documentation. And you must understand that you must intervene manually after changing the definition of f1() by dropping and re-creating f2()—even though you use the same source text and other attributes for the new f2() as defined the old f2().

The emphasizes the importance of the advice that the tip Always use 'drop' followed by a fresh bare 'create', above, gives.

A functional dependency like f2() upon f1() might plausibly arise in a scheme that you might adopt to implement a single point of definition for universal constants that are needed by several subprograms. You could create a composite type whose attributes represent the constants together with an immutable function that sets the values of these attributes and returns the composite type instance. For example, such a constant might be a regular expression that represents an undesirable pattern. You might sometimes want just to detect occurrences of the pattern with a boolean function; and you might sometimes want to remove them with a function that returns a stripped version of the input value. These two functions will depend functionally upon the function that returns the constants. If you follow the advice "label your functions with the strictest volatility category that is valid for them" (see the tip above that quotes from the PostgreSQL documentation on function volatility categories), then you will mark both the detection function and the stripping function immutable.