Alterable function-only attributes
The volatility attribute has these allowed values:
The default is volatile.
This attribute allows the function's author to state a promise about the timespan over which a given (set of) actual arguments uniquely determines the function's return value. According to what is promised, PostgreSQL (and therefore YSQL) is allowed to cache some number of "return-value-for-actual-arguments" pairs—in pursuit of improving performance. If caching is done, the scope is a single session and the duration is limited to the session's lifetime.
Notice that "is allowed to cache" does not mean "will cache". The mere possibility is critical to the definition of the semantics of volatility. The conditions that make caching more, or less, likely are a separable concern.
Tautologically, PostgreSQL (and therefore YSQL) is unable to detect if the promise is good—so an author who gives a false promise is asking for wrong results.
See the PostgreSQL documentation for more detail.
The section 38.7. Function Volatility Categories explains some of the more subtle aspects of function volatility. In particular, it makes this recommendation:
For best optimization results, you should label your functions with the strictest volatility category that is valid for them.
The section 43.11.2. Plan Caching explains the caching mechanism and how it is tied to the notion of prepare and the possibility that a prepared statement might (or might not) cache its execution plan.
This denotes no promise at all. Here's a compelling demonstration of a function where volatile is the only possible honest choice:
drop function if exists volatile_result() cascade; create function volatile_result() returns text volatile language sql as $body$ select gen_random_uuid()::text; $body$; select volatile_result() as v1, volatile_result() as v2;
This is a typical result:
v1 | v2 --------------------------------------+-------------------------------------- bf4b4d1d-081a-4186-adc6-173016e0f485 | f9b210dc-d42b-4a88-bdd9-cca1949e0319
The function volatile_result() returns different results upon successive evaluations even during the execution of a single SQL statement.
This denotes a promise that holds good just for the duration of a SQL statement's execution. The human analyst can readily see that this following function can honestly be marked as stable:
drop function if exists stable_result(text) cascade; create function stable_result(which in text) returns text stable language sql as $body$ select case when which = 'a' then current_setting('x.a') when which = 'b' then current_setting('x.b') end; $body$;
The scope of a user-defined session parameter like "x.a" is just the single session that sets it. And, in the example, the human can readily see that the code of stable_result() doesn't set "a.x" or "x.b" — and that no other route exists to setting it from elsewhere while stable_result() is executing. Test it like this:
set x.a = 'dog'; set x.b = 'cat'; select stable_result('a'), stable_result('b');
Clearly, the return values from stable_result() can be different when it's invoked in a new SQL statement thus:
set x.a = 'frog'; set x.b = 'bird'; select stable_result('a'), stable_result('b');
When you mark a function as immutable, you give permission for PostgreSQL (and therefore YSQL) to build a session-duration cache where the key to a cache-entry is the vector of actual arguments with which the function is invoked and the key's payload is the function's return value.
(The caching mechanism is the prepared statement and the possibility to cache an execution plan with it. But you needn't understand the mechanism in order to understand the semantic proposition.)
Further, if you want to create an expression-based index that references a user-defined function, then it must be marked immutable. Without this volatility setting, you get this error:
42P17: functions in index expression must be marked IMMUTABLE
Marking a function as immutable expresses a promise that must hold good for the lifetime of the function's existence (in other words, from the moment it's created to the moment that it's dropped) thus:
- The function has no side effects.
- The function is mathematically deterministic—that is, the vector of actual arguments uniquely determines the function's return value.
Nothing prevents you from lying. But doing so will, sooner or later, bring wrong results.
See the section Immutable function examples.
The on_null_input attribute has these allowed values:
- called on null input
- returns null on null input
The default is called on null input. Notice that strict is simply a synonym for returns null on null input.
called on null input
This allows the function to be executed just as its source code specifies when at least one of its actual arguments is null. Function authors must then take the responsibility for handling the case that any actual is null appropriately.
This instructs YSQL simply to skip executing the function's source code when at least one of its actual arguments is null and simply to return null immediately.
\pset null '<NULL>' deallocate all; drop function if exists f(text, int, boolean) cascade; create function f(t in text, i in int, b in boolean) returns text called on null input language plpgsql as $body$ declare status text not null := '???'; begin if t is null then status := 'Bad: t is null'; elsif i is null then status := 'Bad: i is null'; elsif b is null then status := 'Bad: b is null'; else status := 'OK.'; end if; return status; end; $body$; prepare q as select (select f('dog', 42, true)) as test_0, (select f(null, 42, true)) as test_1, (select f('dog', null, true)) as test_2, (select f('dog', 42, null)) as test_3; execute q;
This is the result:
test_0 | test_1 | test_2 | test_3 --------+----------------+----------------+---------------- OK. | Bad: t is null | Bad: i is null | Bad: b is null
Now try this:
alter function f(text, int, boolean) strict; execute q;
This is the new result:
test_0 | test_1 | test_2 | test_3 --------+--------+--------+-------- OK. | <NULL> | <NULL> | <NULL>
Always explain your reasoning carefully in the design documentation when you decide to mark a function as 'strict'.
It's quite hard to imagine a plausible use case where you want silently to bypass a function's execution—especially given that a function is not supposed to have side effects.
Yugabyte recommends that when you come across such a use case and decide to mark a function as strict, you explain your reasoning very carefully in the design documentation.
The parallel_mode attribute has these allowed values:
The default is unsafe. You risk wrong results in a parallel query:
- If you mark a function as parallel safe when it should be marked restricted or unsafe,.
- If you mark a function as parallel restricted when it should be marked unsafe.
In this way, the parallel mode attribute is like the volatility and leakproof attributes. You must make the marking honestly; and YSQL will not detect if you lie.
This tells YSQL that the function can't be executed in parallel mode. The presence of such a function in a SQL statement therefore forces a serial execution plan.
You must mark a function as parallel unsafe:
- If it modifies any database state.
- If it makes any changes to the transaction such as using sub-transactions.
- If it accesses sequences or attempts to make persistent changes to settings.
Notice that, because a function ought not to have side-effects, you should consider using a procedure instead. If it needs to return a value, then use an inout argument.
You must mark a function as parallel restricted:
- if it accesses temporary tables.
- If it accesses client connection state ((for examples by using the current_value() built-in function).
- If it accesses a cursor or any miscellaneous backend-local state which the system cannot be synchronized in parallel mode.
For example, the setseed() built-in function sets the seed for subsequent invocations of the random() built-in function; but setseed() cannot be executed other than by the parallelization group leader because a change made by another process would not be reflected in the leader.
This tells YSQL that the function is safe to run in parallel mode without restriction.
The default for this attribute is not leakproof. Only a superuser may mark a function as leakproof.
Functions and operators marked as leakproof are assumed to be trustworthy, and may be executed before conditions from security policies and security barrier views. This is a component of the
Rules and Privileges functionality. See the account of
create view in the PostgreSQL documentation for the syntax for the security_barrier attribute.
The leakproof attribute indicates whether or not the function has any side effects. A function is considered to be leakproof only if:
- It makes no changes to the state of the database.
- It doesn't change the value of a session parameter.
- It reveals no information about its arguments other than by its return value.
For example, a function is not leakproof if:
It might raise an error for any particular value for at least one of the actual arguments with which it is invoked.
It might report the value of least one of the actual arguments with which it is invoked in an error message.
Just as is the case with the volatility attribute, the decision to mark a function as leakproof or not leakproof requires, and depends entirely upon, human judgment. YSQL cannot police the programmer's honesty.
The following demonstration assumes that you can connect to some database as a regular role and as a superuser. This code uses the database demo and connects as the regular role u1 and the superuser role postgres. Change the names to suit your environment.
\c demo u1 set client_min_messages = warning; drop schema if exists s1 cascade; create schema s1; create function s1.f(i in int) returns int language plpgsql not leakproof as $body$ begin return i*2; end; $body$; create view s1.f_leakproof_status(leakproof) as select proleakproof::text from pg_proc where pronamespace::regnamespace::text = 's1' and proname::text = 'f' and prokind = 'f'; select leakproof from s1.f_leakproof_status;
This is the result:
leakproof ----------- false
Now create a security invoker procedure to mark s1.f(int) as leakproof and execute it:
create procedure s1.mark_f_leakproof(result in out text) security invoker language plpgsql as $body$ begin alter function s1.f(int) leakproof; result := '"s1.f(int)" is now marked as leakproof'; exception when insufficient_privilege then result := 'Only superuser can define a leakproof function.'; end; $body$; call s1.mark_f_leakproof('');
This is the result, as expected:
result ------------------------------------------------- Only superuser can define a leakproof function.
Notice how procedure s1.mark_f_leakproof(text) is designed:
- It is set to be security invoker so that it will act with the privileges of the invoking role—in this demonstration either the regular role u1 or the superuser role postgres. This means that its power depends upon knowing the password for the postgres role.
- It is created as a procedure, and not as a function, even though it needs to return a success message, because procedures do something—but functions simply name a computed value that will be used in an expression and ought not to have (regular) side effects.
Now connect as postgres and execute mark_f_leakproof() again:
\c demo postgres set client_min_messages = warning; call s1.mark_f_leakproof('');
This is the new result, again as expected:
result ---------------------------------------- "s1.f(int)" is now marked as leakproof
Re-connect as u1 and check the leakproof status:
\c demo u1 set client_min_messages = warning; select leakproof from s1.f_leakproof_status;
This is the new result:
leakproof ----------- true
Cost and rows
Each of these attributes takes a positive integer argument. They provide information for the planner to use.
The cost attribute provides an estimate for execution cost for the function, in units of cpu_operator_cost. If the function returns a set, this is the cost per returned row. If the cost is not specified, then 1 unit is assumed for C-language and internal functions, and 100 units is assumed for functions in all other languages. Larger values cause the planner to try to avoid evaluating the function more often than necessary. (This suggests that the function should be marked with stable or immutable volatility.)
The rows attribute provides an estimate of the number of rows that the planner should expect the function to return. This is allowed only when the function is declared to return a set. The default assumption is 1000 rows.