Name resolution within top-level SQL statements
Regard the PostgreSQL documentation as the canonical definitional reference for SQL syntax.In particular, see the section SQL Syntax within the enclosing section The SQL Language. The discussion of name resolution in top-level SQL statements rests on the notions that the SQL Syntax section explains.
This section deals mainly with name resolution for the schema-objects that a SQL statement references and how this depends critically on the search_path. Name resolution for secondary objects, like columns in relations, is sketched briefly, at the end.
A top-level SQL DML statement uses identifiers to denote any number of schema-objects with kinds like table, view, function, and so on. (The call statement is a special case: it must use an identifier to specify exactly one schema-object with the kind procedure.) Further, a select, insert, update, or delete statement typically specifies other secondary phenomena like columns within relations, aliases for relations or columns, the identifiers for common table expressions, or subprogram arguments. Further, a column identifier is sometimes decorated with the typecast operator—and in this case the target type (itself a schema-object) is denoted by an identifier too. And so it goes on. The discussion is easily extended to other kinds of SQL statement like create, alter, or drop statements.
Each of the identifiers that a SQL statement uses may be unqualified or qualified with the identifier for a schema. And when an unqualified identifier is used, it must be resolved to what a qualified identifier would denote. The resolution scope for schema-objects is the entire current database. And the resolution scope for the secondary phenomena that the statement identifies is established by properties of the overall set of schema-objects that the statement refers to, and by aliases that the statement defines.
Name resolution for schema-objects
Unqualified identifiers for schema-objects are resolved according to what the search_path specifies.
The "search_path" run-time parameter
Treat the PostgreSQL documentation as the canonical reference for 'search_path'.
The PostgreSQL documentation describes the semantics of the search_path run-time parameter in the section Client Connection Defaults. Notice this:
...the temporary schema is only searched for relation (table, view, sequence, etc) and data type names. It is never searched for function or operator names.
The value of the search_path run-time parameter determines how the names that unqualified identifiers that denote schema-objects are resolved. Briefly, an unqualified identifier denotes only a name and provides no information to specify the schema in which the object with the denoted name is to be found.
Names and identifiersThe terms of art name and identifier have different meanings. The difference is explained in the section Names and identifiers.
The value of search_path is, in general, a comma-separated list of schema identifiers. (The list may specify just one schema; or it may be empty.) You can observe the current value of search_path like this:
(The term of art setting is used as a convenient shorthand for run-time parameter.)
The search_path setting is always defined—that is it is never null. If you create a new session and query it, then—as long as you haven't set its default value as a property of the current database or the current role, you'll see this:
This is a reminder that a freshly-created database (unless you have customized the template that create database used) has a schema called public with all privileges granted to the public role.
Always drop the 'public' schema in a database that will be used in a production application.The existence of the public schema, where any role can create objects, is convenient in a sandbox database that will be used exclusively for ad hoc experiments with basic syntax and the like. But its presence brings a huge risk in any serious use case. The set search_path statement cannot be restricted. So an unscrupulous user can put the public schema at the start of the search path and create objects to capture the application's intended objects—unless care has been taken always to use schema-qualified identifiers both in SQL statements that client-side code issues and in user-defined subprograms in the database. (User-defined subprograms support an alternative, and recommended, mechanism to control name resolution: you can set the search_path as a subprogram attribute—described here.)
Show that name resolution always searches 'pg_catalog'
Start a session by authorizing as an ordinary role to a sandbox database on which this role has all privileges and show that pg_catalog is inevitably on the search path:
create schema "my schema"; set search_path = "my schema"; create table "my schema".xyz_table(k int primary key); create table "my schema".pg_class(relname name primary key, relkind "char"); insert into "my schema".pg_class(relname, relkind) values('xyz in "my schema".pg_class', 'r'); prepare qry as select ( select current_setting('search_path') as "search_path" ), ( select relname as "row from pg_class" from pg_class where relname::text ~ '^xyz' and relkind::text = 'r' ); execute qry;
This is the result:
search_path | row from pg_class -------------+------------------- "my schema" | xyz_table
Notice that the user-created pg_class table has a row that says where it lives. You can therefore see that the unqualified identifier in the from clause, from pg_class, is resolved not to the user-created table in my schema but, rather, to the table with that name in the pg_catalog schema.
It's bad practice to create a user-object with a name that collides with that of an object in 'pg_catalog'.See the section Practice recommendation below. That recommendation is ignored in this example in order convincingly to demonstrate how pg_catalog is inevitably included in the effective search_path whether or not it is explicitly mentioned in its definition.
The name resolution algorithm is specified to execute thus:
- The search for the object with the name that the unqualified identifier denotes starts in the leftmost schema in the effective path—and if a match is found the algorithm exits.
- If no match is found there, then the search is done anew in the next schema in the effective path, going from left to right—and so on.
- If no match has been found by the time the rightmost schema in the effective path has been searched, then an error is reported.
The test shows that the pg_catalog schema is understood, implicitly, to be to the left of everything that search_path specifies when the search_path doesn't mention this explicitly—in other words, the effective path always includes pg_catalog.
The algorithm needs to be more precisely stated to be properly general.For example, the context of the use of the unqualified identifier might establish the fact that the match must be a function. In this case, the search is satisfied only by finding a function with the matching name and with the correct subprogram_signature.
You might wonder if you could tell name resolution never to try to resolve an unqualified identifier but, rather, simply to fail by not finding the object whose name is denoted. But there's no syntax for this. You might be tempted to try this:
set search_path = ''; execute qry;
This is the new result:
search_path | row from pg_class -------------+------------------- "" | xyz_table
The query renders the displayed value of search_path as two adjacent double quotes. This is PostgreSQL's idiomatic way of showing that search_path has been set to the empty string. But executing qry shows that this does not mean that search_path is undefined because name resolution still chooses this table:
While there's no syntax to express "never look in pg_catalog", you can define search_path by including pg_catalog in the value and by doing so determine the search order. Try this:
set search_path = "my schema", pg_catalog; execute qry;
This is the new result:
search_path | row from pg_class -------------------------+----------------------------- "my schema", pg_catalog | xyz in "my schema".pg_class
Because pg_catalog is now searched after my schema, name resolution now chooses this target:
Show that name resolution always searches 'pg_temp' as well as 'pg_catalog'
Create a temporary table, populate it with a row to say where it lives, and re-execute the prepared query:
create temporary table pg_class(relname name, relkind "char"); insert into pg_class(relname, relkind) values('xyz in pg_temp.pg_class', 'r'); execute qry;
This is the new result:
search_path | row from pg_class -------------------------+------------------------- "my schema", pg_catalog | xyz in pg_temp.pg_class
This outcome pinpoints a danger. Any role with the privilege to create temporary objects in the current database can create such objects that capture the application's intended objects and thereby subvert its intended behavior. It's good practice to ensure, at least, that pg_temp is always the rightmost schema in the search_path:
set search_path = "my schema", pg_catalog, pg_temp; execute qry;
Ensure that the role as which client-side sessions connect doesn't have the _temporary_ privilege.Better still, ensure that the role as which client-side sessions connect doesn't have the temporary privilege on the database that houses the application. (Notice that if the functionality of temporary schema-objects is needed, you can bequeath the ability to create and use them by implementing security definer subprograms for the purpose and by granting execute on these to the client role.)
This restores the earlier (and presumably intended) result:
search_path | row from pg_class ----------------------------------+----------------------------- "my schema", pg_catalog, pg_temp | xyz in "my schema".pg_class
You can create temporary objects of all kinds but unqualified identifiers work only for creating temporary tables, views, and sequences.The explicit create temporary ... syntax works only for tables, views, and sequences. But you can create temporary objects of all kinds simply by using a schema qualified identifier whose schema component is pg_temp. See the section Creating temporary schema-objects of all kinds).
Show that user-defined code can subvert the behavior of the native equality operator
The rule that name resolution for a function or an operator never resolves to such a schema object in the pg_temp schema removes the risk of accidental, or deliberate, capture in two cases. But the risk remains, for example when an application developer makes a mistake, that an application schema-object of any kind can capture another one.
The example models the case that the naïve developer understands the importance of including both pg_catalog and pg_temp explicitly in the session's search_path definition but unwisely places user-created schema(s) ahead of pg_catalog. Start a session as an ordinary role that has connect and create on the chosen database and set up thus:
create schema s; create function s.equals( a1 in int, a2 in int) returns boolean -- This function attribute IS wisely specified. set search_path = pg_catalog, pg_temp language sql as $body$ select case when (a1 is null) or (a2 is null) then null when (a1 = 17) and (a2 = 42) then true else false end; $body$; create operator s.= ( leftarg = int, rightarg = int, procedure = s.equals); prepare qry as select (1 = 1)::text as "1 = 1", (17 = 42)::text as "17 = 42", ( 17::int operator(pg_catalog.=) 42::int )::text as "17 = 42 baroque syntax";
Now test it using an unwise session-level search_path:
set search_path = s, pg_catalog, pg_temp; execute qry;
This is the result:
1 = 1 | 17 = 42 | 17 = 42 baroque syntax -------+---------+------------------------ false | true | false
Now test it using a sensible session-level search_path:
set search_path = pg_catalog, s, pg_temp; execute qry;
This is the result:
1 = 1 | 17 = 42 | 17 = 42 baroque syntax -------+---------+------------------------ true | false | false
You might argue that this demonstration shows, unremarkably, that you get what you ask for exactly as the rules specify. But it aims to make a point about style and prudence rather than just to illustrate how the rules work. The functions and operators in the pg_catalog schema define, in PostgreSQL and therefore YSQL, what in other RDBMSs is typically hard-coded in the implementation: what most developers would consider to be intrinsic features of SQL like the definitions of the equality and comparison operators, the concatenation operator, and so on, and fundamental functions like length() and sqrt(). And the demonstration shows how you can provide your own implementations of these and manipulate the search_path so that your implementations capture the native, documented ones. It shows, also, how you can use baroque syntax to force the use of the native functionality.
It is argued here that it is simply terrible, imprudent style to subvert the native, documented functionality in this way—and to force the use of baroque syntax to regain that functionality.
If you want to provide alternative implementations for functions and operators that 'pg_catalog' defines, invent your own, unique, names for these.This is the only sane way to avoid confusion and the risk of buggy maintenance of your application code.
No matter what value you set for 'search_path', 'pg_temp' and 'pg_catalog' are always searched
The tests have shown that you cannot exclude either pg_temp or pg_catalog from what name resolution searches. You can, however, control the order.
If the definition of search_path mentions neither pg_temp nor pg_catalog, then the effective search order that name resolution uses is:
pg_temp, [everything that the definition did mention, in that order], pg_catalog
If the definition of search_path does not mention pg_catalog but does mention pg_temp, then the effective search order that name resolution uses is:
[everything that the definition did mention, in that order], pg_catalog
If the definition of search_path does not mention pg_temp but does mention pg_catalog, then the effective search order that name resolution uses is:
pg_temp, [everything that the definition did mention, in that order]
If the definition of search_path mentions both pg_catalog and pg_temp, then the effective search order that name resolution uses is, of course:
[everything that the definition did mention, in that order]
There's no naming convention for 'pg_catalog' objects like functions.
There's no general naming convention to help you choose a name for an application object so that it doesn't collide with that of a pg_catalog object. (Obviously, you shouldn't use names that start with pg_ (or yb_) for any kind of schema-object.) For example, there are three overloads of a function with the very ordinary name area() in pg_catalog. Try this:
create schema s; set search_path = pg_catalog, s, pg_temp; create function s.area(lngth in numeric, hght in numeric) returns numeric language sql as $body$ select lngth*hght; $body$; select area(2.0, 3.0);
This is the result:
area ------ 6.00
Here, even if you don't place s leftmost in the search_path, name resolution will choose your s.area() because (through the current version) no occurrence of pg_catalog.area() has the right signature. However, you cannot rely on the assumption that a future version will never define a pg_catalog overload that collides with your implementation—and that therefore brings confusion.
Your only recourse is to confirm, when you invent a name for an application object, that it doesn't collide with the name of any object in the pg_catalog schema.
The forgoing examples have demonstrated the name resolution rules and pointed out the risks that these bring. You clearly must prioritize correct, unsubvertible, application behavior over coding convenience for the application code author and especially over the speed at which you can type ad hoc tests. Remember that nothing can stop any role from setting search_path.
- You must ensure that pg_temp is searched last. This is because, without this practice, a role that can create temporary objects can capture an application object with a temporary object with the same name and subvert the application's behavior. If you want to use a temporary object, then you should use a schema-qualified identifier for it that starts with pg_temp.
- The decision about where to put pg_catalog in the search order is more subtle. Generally speaking, it's bad practice to create a user-object with a name that collides with that of an object in pg_catalog. The objects in pg_catalog define PostgreSQL's implementation, or list facts about such objects—and so a colliding user-defined object would change PostgreSQL's behavior. It's hard to see how doing this would not be confusing and dangerous. This implies putting pg_catalog first in search_path. You then need to avoid the possibility of a pg_catalog object capturing an application object by never giving an application object a name that collides with that of a pg_catalog object.
There might be plausible use cases where it isn't known until run-time in which schema the object that's the intended name resolution target will be found, where it might be found in more than one schema, and where the requirement is to resolve to the one that is first on the search_path. But such scenarios are rare. Yugabyte therefore recommends that you simply do not rely on name resolution for anything but objects in the pg_catalog schema and adopt this practice;
- Use fully qualified names in the SQL that application code issues unless you can explain, in the design document, why this is unsuitable.
- Set the search_path to just pg_catalog, pg_temp.
There's no risk that you might create objects in the pg_catalog schema. Connect as the postgres role and try this:
set search_path = pg_catalog, pg_temp; create table t(n int);
It causes this error:
42501: permission denied to create "pg_catalog.t"
Notice that it is possible, by setting a special configuration parameter, to allow the postgres role to create objects in the pg_catalog schema. But even then, only the postgres role can create objects there. If a bad actor can manage to connect as the postgres role, then all bets are anyway already off.
Name resolution for secondary objects
Secondary objects, like columns in tables, are resolved within the namespace that the top-level statement defines. PostgreSQL enforces some syntax rules that immediately disambiguate the status of an identifier as a putative schema-object or a putative secondary object where using, say, Oracle Database, the programmer needs to understand tortuous rules. For example, a function invocation in PostgreSQL must always specify the list of actual arguments. This holds even when there are no actual arguments. Here, you must append empty trailing parentheses to the identifier for the function. (Oracle Database allows you, optionally, to elide empty parentheses.) This means that a select-list item is known to be either a function or a column in a relation before attempting name-resolution. Further, PostgreSQL doesn't support packages and so there cannot be any confusion (even without considering the possible parentheses) between, say, schema_name.table_name and package_name.function_name.
select v, f() from t;
can mean only that:
- t is a schema-object of type table or view and that it must be resolved using the reigning value of search_path.
- v is a column in the relation t in whatever schema name resolution finds it.
- f is a schema-object of type function and that it must be resolved using the reigning value of search_path.
select s.t.v from t;
can only mean that:
- s is a schema.
- t is a relation in the schema s.
- v is a column in the relation s.t.
select (s.t).v from s;
can only mean that:
- s is a relation and that it must be resolved using the reigning value of search_path.
- t is a column in the relation s in whatever schema name resolution finds it. Moreover, the data type of t is a composite type. (The identity of the composite type can't be discerned by simple inspection of this statement. Rather, this must be determined from the metadata for the relation s in whatever schema name resolution finds it.)
- v is an attribute of the composite type that defines the data type of the column s.t.
Notice that the select arguments in these two examples, s.t.v and (s.t).v differ in spelling only in that the second surrounds the first two identifiers with parentheses. But these aren't just decoration to improve readability. Rather, their use conveys essential semantic information about the status of the identified phenomena: schema, table, and column; or table, column, and composite type attribute.
You might think that the first of this pair of examples is very confusingly written and that this would be preferable:
select a.v from s.t as a;
But the example, as presented, is legal; and in the presence of a matching schema s and a relation t with a column v, it executes without error and produces the expected result. The use of the alias a is interesting in its own right. A "keyhole" inspection of a.v tells you that it might mean the column v in the schema-level table a (but not the function v() in the schema a). However, analysis of the from list tells the parser that a in the select list can only be the alias a that is defined, privately, for the present statement.
Of course, the engineers who implement Postgres's SQL processing code need to understand all the rules that govern name-resolution for secondary objects in complete and exact detail. But anecdotal evidence tells us that ordinary application programmers who write practical SQL (especially when everything has a sensible name) are able easily to express their meaning without being able to rehearse these rules precisely.
The essential pedagogy of this current section is the explanation of the critical role that the search_path plays in the name resolution of schema-objects.