Resolution of unqualified names in SQL statements that user-defined subprograms issue

When a subprogram issues a SQL statement that uses an unqualified name, an attempt is made to resolve the name to an actual object in some schema according to the current search_path—as given by current_setting('search_path'). The rule is the same irrespective of whether the subprogram has security invoker or security definer:

  • The search_path may be simply what the function's owner happens to have at the moment that the function is executed.

  • Or it may be a pre-determined attribute of the subprogram that you set explicitly as the alterable_fn_and_proc_attribute rule specifies.

The first possibility (when the subprogram's search_path attribute is left unset) is a notorious security anti-pattern because there's a risk that a bad actor can contrive to change the current search_path at runtime and subvert the intended meaning of the subprogram. The second choice is therefore preferred.

Notice that the risk is more troublesome for a security definer subprogram than it is for a security invoker subprogram because the former will be designed to do only carefully restricted operations while acting with privileges that the invoking role will typically not have. In contrast, a security invoker subprogram is just a convenience encapsulation for what the invoking role could do directly with top-level SQL. Nevertheless the invoker of any subprogram wants to be able to rely on its correctness. And name capture, even if it happens because of carelessness rather than because of a bad actor's deliberate plan, will bring incorrect behavior.

See the section 'Writing security definer functions safely' in the PostgreSQL documentation.

Here is the section that this tip's title mentions. It explains the risk that the anti-pattern brings and recommends protecting against it by setting the intended search_path as an attribute of the subprogram.

How are unqualified names in the SQL that a subprogram issues resolved?

The following example relies on tables—all in a single database whose name is insignificant, and all with the same owner whose name is also insignificant. The understanding of name resolution is orthogonal to that of object ownership and privileges—as long as you have the appropriate privileges to access all the objects of interest.

  • It creates three schemas, s1, s2, and s3.
  • It creates three tables, all with the same name, a—one in each of s1, s2, and s3.
  • It creates one table with a different name, b, in just the one schema, s1.

To run the example, you need to be able to connect to a sandbox database as an ordinary role. (The code below uses u1.)

drop schema if exists s1 cascade;
drop schema if exists s2 cascade;
drop schema if exists s3 cascade;

create schema s1;
create schema s2;
create schema s3;

create table s1.a(k int primary key);
create table s2.a(k int primary key);
create table s3.a(k int primary key);
create table s1.b(k int primary key);

List the oid, the name, and the name of the schema where it lives for each of the tables.

with c(name, schema, oid) as (
  select
    c.relname ::text,
    n.nspname ::text,
    c.oid     ::text
  from
    pg_class c
    inner join
    pg_namespace n
    on c.relnamespace = n.oid
  where
    c.relowner::regrole::text = 'u1' and
    c.relname in ('a', 'b') and
    c.relkind = 'r')
select name, schema, oid
from c
order by name, schema;

The oid values will change every time that you run this example. Here's a typical result:

 name | schema |  oid
------+--------+-------
 a    | s1     | 16572
 a    | s2     | 16577
 a    | s3     | 16582
 b    | s1     | 16587

Now create three functions, one in each of the three schemas:

create function s1.f()
  returns text
  language plpgsql
  set search_path = s1, s2, s3
as $body$
begin
  return
    'a: '||(select 'a'::regclass::oid::text)||' / '||
    'b: '||(select 'b'::regclass::oid::text);
end;
$body$;

create function s2.f()
  returns text
  language plpgsql
  set search_path = s2, s3, s1
as $body$
begin
  return
    'a: '||(select 'a'::regclass::oid::text)||' / '||
    'b: '||(select 'b'::regclass::oid::text);
end;
$body$;

create function s3.f()
  returns text
  language plpgsql
  set search_path = s3, s1, s2
as $body$
begin
  return
    'a: '||(select 'a'::regclass::oid::text)||' / '||
    'b: '||(select 'b'::regclass::oid::text);
end;
$body$;

The SQL texts of these three create statements are close to identical.

The expressions 'a'::regclass::oid::text, and correspondingly for b, discover the oid values of the actual objects to which the unqualified table names a and b resolve.

The functions differ only in these ways:

  • The fully qualified names are, respectively, s1.f(), s2.f(), and s3.f().

  • The search_paths are function-specific, thus:

    s1.f(): s1, s2, s3
    s2.f(): s2, s3, s1
    s3.f(): s3, s1, s2
    

Execute the functions like this:

select
  (select s1.f()) as "s1.f()",
  (select s2.f()) as "s2.f()",
  (select s3.f()) as "s3.f()";

If you do this immediately after creating all the objects, then you'll see the same oid values again, thus:

       s1.f()        |       s2.f()        |       s3.f()
---------------------+---------------------+---------------------
 a: 16572 / b: 16587 | a: 16577 / b: 16587 | a: 16582 / b: 16587

You can see from the oid values that each of the three functions found a different resolution for the unqualified table name a; and each found the same resolution for the unqualified table name b. But the rule is the same in all cases. A subprogram resolves an unqualified name in the schema that is first on the reigning search_path where an object with the to-be-found name exists.

Consider using fully qualified names

There are doubtless plausible use cases where it isn't known until run-time in which schema the object that's the intended name resolution target will be found, where it might be found in more than one schema, and where the requirement is to resolve to the one that is first on the search_path. But such scenarios are rare.

The conventional use-case for security definer subprograms is to provide exactly and only the planned access to table data that the invoking role is unable to operate on directly. This implies a designed regime of several roles where each owns objects in different schemas and where all of the names are given explicitly in the application's design specification. In this case, the dynamic resolution, that uses a search_path, is no advantage and also brings a risk.

Relying on the search_path for name resolution (for example, as a consequence of careless programming where the author forgot to use a fully qualified object name but where the intended object is still found) brings another risk:

  • Even when current_setting('search_path') doesn't show the pg_catalog and the pg_temp schemas, these are inevitably used by the implementation of name resolution. By default, the pg_temp schema is the very first one to be searched; and the pg_catalog schema is the very next.

This is explained in the PostgreSQL documentation in the subsection search_path in the section Client Connection Defaults. Notice that it explains that pg_temp is an alias for whatever pg_temp_nnn the session happens currently to use.

The search_path account goes on to explain that you can change where pg_temp and pg_catalog come in the search order by mentioning these explicitly when you set the search_path. Try this:

-- Rely, at first, on the default serach order.
select count(*) from pg_class;

A typical result in a freshly created database is a few hundred. Now capture the name of this catalog table with a temporary table and repeat the count(*) query:

create temporary table pg_class(k int);
select count(*) from pg_class;

The count(*) result is now zero because the new temporary table has captured the resolution from the same-named table in pg_catalog. Now reverse the search order of pg_temp and pg_catalog by setting this explicitly, and repeat the count(*) query again:

set search_path = pg_catalog, pg_temp;
select count(*) from pg_class;

Now the count(*) query will notice one more object in the catalog than it did at the start because of the new presence of the temporary table. The opinion among experts on the pgsql-general email list is that the default behavior, to search pg_temp before pg_catalog, is unfortunate; and they recommend reversing this—at least in a subprogram's specific setting.

This gives you a convenient way to enforce the practice that a subprogram uses only fully qualified names. Here's an example:

drop function s1.f();
create function s1.f()
  returns text
  language plpgsql
  set search_path = pg_catalog, pg_temp
as $body$
begin
  return
    'a: '||(select 's1.a'::regclass::oid::text)||' / '||
    'b: '||(select 's1.b'::regclass::oid::text);
end;
$body$;

drop function s2.f();
create function s2.f()
  returns text
  language plpgsql
  set search_path = pg_catalog, pg_temp
as $body$
begin
  return
    'a: '||(select 's2.a'::regclass::oid::text)||' / '||
    'b: '||(select 's1.b'::regclass::oid::text);
end;
$body$;

drop function s3.f();
create function s3.f()
  returns text
  language plpgsql
  set search_path = pg_catalog, pg_temp
as $body$
begin
  return
    'a: '||(select 's3.a'::regclass::oid::text)||' / '||
    'b: '||(select 's1.b'::regclass::oid::text);
end;
$body$;

select
  (select s1.f()) as "s1.f()",
  (select s2.f()) as "s2.f()",
  (select s3.f()) as "s3.f()";

The oid values that you see are identical to those that you saw when the subprograms had explicit search_path attributes and used unqualified table names:

       s1.f()        |       s2.f()        |       s3.f()
---------------------+---------------------+---------------------
 a: 16572 / b: 16587 | a: 16577 / b: 16587 | a: 16582 / b: 16587

Use fully qualified names in the SQL that subprograms issue unless you can explain why this is unsuitable.

Yugabyte recommends that, for the typical case, you use fully qualified names in the SQL statements that subprograms issue and that, to reinforce your plan, you set the search_path to just pg_catalog, pg_temp for every subprogram.

There's no risk that you might create objects in the pg_catalog schema. Connect as the postgres role and try this:

set search_path = pg_catalog, pg_temp;
create table pg_catalog.t(n int);

It causes this error:

42501: permission denied to create "pg_catalog.t"

Notice that it is possible, by setting a special configuration parameter, to allow the postgres role to create objects in the pg_catalog schema. But even then, only the postgres role can create objects there. If a bad actor can manage to connect as the postgres role, then all bets are anyway already off.