YSQL currently supports only fetching rows from a cursor consecutively in the forward direction.See the section Beware Issue #6514 below.
This section explains:
- what a cursor is
- how you can manipulate a cursor explicitly, when your use case calls for this, using either a SQL API or a PL/pgSQL API
The functionally equivalent PL/pgSQL API is exposed by the executable statements open, move, fetch, and close. Each of these specifies its cursor using an identifier for the name of a value of the dedicated data type refcursor. You can declare a refcursor, just as you declare other variables, in the PL/pgSQL source code's plpgsql_declaration_section. Notice that one flavor of the declaration syntax, the so-called "bound refcursor variable", lets you specify the defining subquery (see below) for the underlying cursor that it denotes. Alternatively, you can specify the data type of a PL/pgSQL subprogram's formal argument as refcursor. The value of a variable or argument whose data type is refcursor is text and is simply the name of the underlying cursor that it denotes.
The section Cursor manipulation in PL/pgSQL—the "open", "fetch", and "close" statements explains, and provides code examples for, this topic.
What is a cursor?
A cursor is an artifact that you create with the declare SQL statement—or with the equivalent PL/pgSQL open statement. A cursor's duration is limited to the lifetime of the session that creates it, it is private to that session, and it is identified by a bare name that must conform to the usual rules for SQL names like those of tables, schemas, and so on. In the sense that it's session-private, that its name isn't schema-qualified, that it has at most session duration, and that it has no explicit owner, it resembles a prepared statement. A session's currently extant cursors are listed in the pg_cursors catalog view.
A cursor is defined by a subquery (typically a select statement, but you can use a values statement) and it lets you fetch the rows from the result set that the subquery defines one-at-a-time without (in favorable cases) needing ever to materialize the entire result set in your application's client backend server process—i.e. the process that this query lists:
select pid, application_name, backend_type from pg_stat_activity where pid = pg_backend_pid();
The internal implementation of a cursor.
You don't need to understand this. But it might help you to see how cursors fit into the bigger picture of how the processing of SQL statements like select, values, insert, update, and delete is implemented. (These statements can all be used as the argument of the prepare statement—and for this reason, they will be referred to here as preparable statements.)
At the lowest level, the implementation of all of SQL processing is implemented in PostgreSQL using C. And YSQL uses the same C code. The execution of a preparable statement uses C structure(s) that hold information, in the backend server process, like the SQL statement text, its parsed representation, its execution plan, and so on. Further, when the statement is currently being executed, other information is held, like the values of actual arguments that have been bound to placeholders in the SQL text and the position of the current row in the result set (when the statement produces one). You can manipulate these internal structures, from client side code, using the libpq - C Library API or, at a level of abstraction above that, the Embedded SQL in C API (a.k.a. the ECPG). Moreover, engineers who implement PostgreSQL itself can use the Server Programming Interface. These APIs have schemes that let the programmer ask for the entire result set from a subquery in a single round trip. And they also have schemes that let you ask for the result set row-by-row, or in batches of a specified size. See, for example, the libpq subsection Retrieving Query Results Row-By-Row.
In a more abstract, RDBMS-independent, discussion of the SQL processing of the statements that correspond to PostgreSQL's preparable statements, the term "cursor" is used to denote these internal structures. (You'll notice this, for example, with Oracle Database.)
But in PostgreSQL, and therefore in YSQL, a cursor is a direct exposure into SQL and PL/pgSQL (as language features) of just a subset of the internal mechanisms for the processing of preparable statements: the values statement and the select statement when it has no data-modifying side-effects.
Here's a counter-example where the select statement does modify data. You can begin a select statement using a with clause that can include a data-modifying statement like insert as long as it has a returning clause. (This ability lets you implement, for example, multi-table insert.) Set the psql variables db and u to, respectively, a convenient sandbox database and a convenient test role that has connect and create on that database. Then create two tables and demonstrate the data-modifying select statement ordinarily:
\c :db :u drop schema if exists s cascade; create schema s; set search_path = pg_catalog, pg_temp; create table s.t(k int primary key, v text not null); insert into s.t(k, v) values (1, 'cat'), (2, 'dog'), (3, 'mouse'); create table s.count(k serial primary key, n int not null); with c(n) as ( insert into s.count(n) select count(*) from s.t returning n) select v from s.t order by v;
This is the result set from the final select, as expected:
v ------- cat dog mouse
Now check that s.count was populated as expected:
select n from s.count;
It does indeed show 3.
Now try to declare a cursor using the same subquery:
declare cur cursor for with c(n) as ( insert into pg_temp.count(n) select count(*) from pg_class where relkind = 'v' returning n) select relname from pg_class where relkind = 'v';
It fails with the 0A000 error: DECLARE CURSOR must not contain data-modifying statements in WITH.
Tautologically, then, a cursor is an artifact that (in general, in PostgreSQL) is characterized thus:
- It is created with the declare SQL statement (or the equivalent open PL/pgSQL statement).
- Its maximum duration is either the session in which it's created or the transaction in which it's created, according to a choice that you make when you create it.
- Its lifetime can be terminated deliberately with the close SQL statement or the same-spelled PL/pgSQL statement.
- It is defined by its name, its subquery and some other boolean attributes.
- Its name and its other attributes are listed in the pg_cursors catalog view.
- It lets you fetch consecutive rows, either one at a time, or in batches whose size you choose, from the result set that its subquery defines.
- It supports the move SQL statement, and the same-spelled PL/pgSQL statement, to let you specify any row, by its position, within the result set as the current row.*
- It supports the fetch SQL statement, and the same-spelled PL/pgSQL statement, that, in one variant, lets you fetch the row at the current position or a row at any other position relative to the current position (either ahead of it or behind* it).
- It supports another variant of the fetch statement (but here only in SQL) that lets you fetch a specified number of rows either forward from and including the row immediately after the current position or backward* from and including the row immediately before the current position.
[*] Notice that YSQL doesn't yet support all of all of these operations. See the section Beware Issue #6514 below..
The "current position" notion is defined by imagining that the cursor's defining subquery always includes this select list item:
row_number() over() as current_position
Here, over() spans the entire result set. If the overall query has no order by clause, then the hypothetical over() has no such clause either. But if the overall query does have an order by clause, then the hypothetical over() has the same order by clause within its parentheses.
When you execute the move* statement, the current position is left at the result to which you moved. And when you execute the fetch statement (fetching one or several rows in either the forward or the backward* direction) the current position is left at the last-fetched result.
Create a trivial helper that will delay the delivery of each row from a subquery's result set:
\c :db :u drop schema if exists s cascade; create schema s; create function s.sleep_one_second() returns boolean set search_path = pg_catalog, pg_text language plpgsql as $body$ begin perform pg_sleep(1.0); return true; end; $body$;
Use it to define a view whose result set is ten rows:
create view s.ten_rows(v) as select a.v from generate_series(1, 10) as a(v) where s.sleep_one_second();
Now execute a query in the obvious way at the ysqlsh prompt:
select v from s.ten_rows;
It takes about ten seconds before you see any results—and then you see all ten rows effectively instantaneously. (Use the \timing on meta-command to time it.) In other words, all ten rows were first materialized in the backend server process's memory before being passed, in a single round trip, to the application.
Create another helper function to fetch one row from a cursor and to return its value together with the time taken to fetch it:
create function s.next_row(cur in refcursor) returns text set search_path = pg_catalog, pg_text language plpgsql as $body$ declare t0 float8 not null := 0; t1 float8 not null := 0; t int not null := 0; v int; begin t0 := extract(epoch from clock_timestamp()); fetch next from cur into v; t1 := extract(epoch from clock_timestamp()); t := (round(t1 - t0)*1000.0)::int; return rpad(coalesce(v::text, '<no row>'), 9)||'-- '||to_char(t, '9999')||' ms'; end; $body$;
Now use a cursor to fetch the ten rows one by one:
\t on start transaction; declare "My Cursor" no scroll cursor without hold for select v from s.ten_rows; select s.next_row('My Cursor'); select s.next_row('My Cursor'); select s.next_row('My Cursor'); select s.next_row('My Cursor'); select s.next_row('My Cursor'); select s.next_row('My Cursor'); select s.next_row('My Cursor'); select s.next_row('My Cursor'); select s.next_row('My Cursor'); select s.next_row('My Cursor'); select s.next_row('My Cursor'); rollback; \t off
Now you see the rows delivered one by one, every second. This is the result. (Blank lines were removed manually.)
1 -- 1000 ms 2 -- 1000 ms 3 -- 1000 ms 4 -- 1000 ms 5 -- 1000 ms 6 -- 1000 ms 7 -- 1000 ms 8 -- 1000 ms 9 -- 1000 ms 10 -- 1000 ms <no row> -- 0 ms
When you execute "select v from ten_rows" ordinarily using ysqlsh, you have to wait until the entire result set has been materialized in the memory of its backend server process before it's delivered to the client application as a unit. This incurs a memory usage cost as well as a time-delay irritation. But when you declare a cursor for that select statement, you materialize the results one row at a time and deliver each to the client as soon as its available. When you use this approach, no more than a single row needs ever to be concurrently materialized in the backend server process's memory.
In real applications, you'll use the piecewise result set delivery that a cursor supports only when the result set is vast; and you'll fetch it in batches of a suitable size: small enough that the backend server process's memory isn't over-consumed; but large enough that the round-trip time doesn't dominate the overall cost of fetching a batch.
Transactional behavior — holdable and non-holdable cursors
A cursor can be declared either as so-called holdable—or not. See the account of the with hold or without hold choice in the section for the declare statement. Try this:
\c :db :u select count(*) from pg_cursors; start transaction; declare "Not Holdable" cursor without hold for select 17; declare "Is Holdable" cursor with hold for select 42; select name, is_holdable::text from pg_cursors order by name; commit; select name, is_holdable::text from pg_cursors order by name; \c :db :u select count(*) from pg_cursors;
An invocation of "select count(*) from pg_cursors", immediately after starting a session, will inevitably report that no cursors exist. The first pg_cursors query (within the ongoing transaction) produces this result:
name | is_holdable --------------+------------- Is Holdable | true Not Holdable | false
And the pg_cursors query immediately after committing the transaction produces this result:
name | is_holdable --------------+------------- Is Holdable | true
In other words, a non-holdable cursor will vanish when the transaction within which it was declared ends—even if the transaction is committed. Because a non-holdable cursor cannot exist outside of an ongoing transaction, this stand-alone attempt:
declare "Not Holdable" cursor without hold for select 17;
causes the 25P01 error: DECLARE CURSOR can only be used in transaction blocks. The wording is slightly confusing because this causes no such error:
declare "Is Holdable" cursor with hold for select 42;
See the section The transaction model for top-level SQL statements. The assumption is that you're running ysqlsh with the default setting of 'on' for the psql variable AUTOCOMMIT.
Notice that the transactional behavior of a cursor differs critically from that of a prepared statement:
\c :db :u select count(*) from pg_prepared_statements; start transaction; prepare stmt as select 42; rollback; select name from pg_prepared_statements;
Like is the case for cursors, an invocation of "select count(*) from pg_prepared_statements", immediately after starting a session, will inevitably report that no prepared statements exist. But even when a statement is prepared within a transaction that is rolled back, it continues to exist after that until either the session ends or it is deallocated. (If you create a holdable cursor, within an on going transaction and then roll back the transaction, then it vanishes.)
Open a holdable cursor in its own transaction and close it as soon as you have finished using it.
When, as is the normal practice, you don't subvert the behavior that automatically commits a SQL statement that is not executed within an explicitly started transaction, you'll probably declare, move in* and fetch from a holdable cursor "ordinarily"—i.e. without explicitly starting, and ending, transactions.
A holdable cursor consumes resources because it always caches its defining subquery's entire result set. Therefore (and especially in a connection-pooling scheme), you should close a holdable cursor as soon as you have finished using it.
A holdable cursor is most useful when you intend to move or to fetch in the backward direction — but YSQL does not yet support this.See the section Beware Issue #6514 below.
When you choose, at cursor creation time, either the scroll or the no scroll options, the result of your choice is shown in the is_scrollable column in the pg_cursors view. Try this:
start transaction; declare "Not Scrollable" no scroll cursor without hold for select g.v from generate_series(1, 5) as g(v); declare "Is Scrollable" scroll cursor without hold for select g.v from generate_series(1, 5) as g(v); select name, is_scrollable::text from pg_cursors order by name; rollback;
This is the result:
name | is_scrollable ----------------+--------------- Is Scrollable | true Not Scrollable | false
The term of art scrollable reflects a rather unusual meaning of scrollability. In, for example, discussions about GUIs, scrolling means moving forwards or backwards within a window or, say, a list. However:
- When pg_cursors.is_scrollable is false, this means that you can change the current position in the cursor's result set (using either move* or as a consequence of fetch) only in the forward direction.
- When pg_cursors.is_scrollable is true, this means that you can change the current position in the cursor's result set both in the forward direction and in the backward direction.
In other words:
- When you create a cursor and specify no scroll, you're saying that you will allow changing the current position in the result set in only the forward direction.
- When you create a cursor and specify scroll, you're saying that you will allow changing the current position in the result set in both the forward direction and the backward direction.
Notice that your choice with the move* statement, to change the current position by just a single row or by many rows is an orthogonal choice to the direction in which you move. Similarly, your choice with the fetch statement, to fetch just a single row or many rows is an orthogonal choice to the direction* in which you fetch.
- There is no way to create a cursor so that changing the current position in the result set by more than one row, except by consecutive fetches, is prevented.
Always specify either 'no scroll' or 'scroll' explicitly.
If you specify neither no scroll nor scroll when you create a cursor, then you don't get an error. However, the outcome is that sometimes backwards movement in the result set is allowed, and sometimes it causes the 55000 error: cursor can only scan forward.*
Yugabyte recommends that you always specify your scrollability choice explicitly to honor the requirements that you must meet. Notice that while Issue #6514 remains open, your only viable choice is no scroll.
"no scroll" cursor demonstration
start transaction; declare cur no scroll cursor without hold for select g.v from generate_series(1, 10) as g(v); fetch next from cur; fetch forward 2 from cur; fetch forward 3 from cur; rollback;
This runs without error and produces these results, as expected:
1 2 3 4 5 6
Caching a cursor's result set
The result set for a with hold cursor is always cached when the transaction that creates it commits.
The result set for a without hold cursor might, or might not, be cached.
If the execution plan for the cursor's defining subquery can be executed in either the forward or the backward direction, then the result set will not be cached.
But if the plan can be executed only in the forward direction, then the result set must be cached if you specify scroll when you create the cursor.
PostgreSQL, and therefore YSQL, do not expose metadata to report whether or not a cursor's result set is cached. Nor does the documentation for either RDBMS attempt to specify the rules that determine whether caching will be done. However, it's possible to reason, about certain specific select statements, that their plans cannot be executed backward. For example the plan for a query that includes row_number() in the select list cannot be run backward because the semantics of row_number() is to assign an incrementing rank to each new row in the result set as it is produced when the plan is executed in the forward direction—and the planner cannot predict how many rows will be produced to allow row_number() to be calculated by decrementing from this for each successive row when the plan is run backward. If the select statement has no order by, then the rows are produced in physical order (i.e. in an order that's determined by how the table data is stored).
The physical order cannot be predicted.The physical order cannot be predicted; and it might even change between repeat executions of the same select statement. However, if you use a cluster on a developer laptop and ensure that the backend process that supports the session that you use to do the tests is the only process whose type is client backend, then it's very likely indeed that successive repeats of the same select statement will produce the same physical order—at least over the timescale of typical ad hoc experiments.
You can, however, support pedagogy by including a user-defined function in the where clause that always returns true and that uses raise info to report when it's invoked.
First, do this set-up. All the caching tests will use it:
drop schema if exists s cascade; create schema s; create table s.t(k serial primary key, v int not null); insert into s.t(v) select generate_series(1, 5); create view s.v(pos, v) as select row_number() over(), v from s.t; create function s.f(i in int) returns boolean set search_path = pg_catalog, pg_temp volatile language plpgsql as $body$ begin raise info 'f() invoked'; return i = i; end; $body$;
"with hold", "no scroll" cursor
start transaction; declare cur no scroll cursor with hold for select pos, v from s.v where s.f(v);
It completes silently without error. Now do this:
This is when you see the raise info output—five times in total, i.e. once for each row that's tested:
INFO: f() invoked INFO: f() invoked INFO: f() invoked INFO: f() invoked INFO: f() invoked
This demonstrates that the result set has been cached. Now fetch all the rows and close the cursor;
fetch all from cur; close cur;
The fetch all statement brings this SQL output:
1 | 5 2 | 1 3 | 4 4 | 2 5 | 3
"without hold", "no scroll" cursor
start transaction; declare cur no scroll cursor without hold for select pos, v from s.v where s.f(v); fetch next from cur; fetch next from cur; fetch next from cur; fetch next from cur; fetch next from cur; rollback;
It's easier to distinguish the raise info output and the SQL output from the statements that bring these if you save this code to, say, t.sql, and then execute it at the ysqlsh prompt. This is what you see:
INFO: f() invoked 1 | 5 INFO: f() invoked 2 | 1 INFO: f() invoked 3 | 4 INFO: f() invoked 4 | 2 INFO: f() invoked 5 | 3
Notice that the same v values are paired with the same pos values as with the "with hold", "no scroll" cursor test. But here, nothing suggests that results are cached—and they don't need to be because the cursor doesn't allow moving backwards in the result set. (However, you can't design a test to demonstrate that results are not cached.)
* Beware Issue #6514
YSQL currently supports only fetching rows from a cursor consecutively in the forward direction.
Issue 6514 tracks the problem that the SQL statements fetch and move, together with their PL/pgSQL counterparts, are not yet fully functional. This is reflected by errors that occur under these circumstances:
Every move flavor causes the 0A000 error with messages like "MOVE not supported yet".
Many fetch flavors draw the 0A000 error with messages like "FETCH FIRST not supported yet", "FETCH LAST not supported yet", "FETCH BACKWARD not supported yet", and the like.
These are the only fetch flavors that do not cause an error:
- fetch next
- bare fetch
- fetch :N
- bare fetch forward
- fetch forward :N
- fetch all
- and fetch forward all
:N must be a positive integer.
Until Issue 6514 is fixed, there is no point in declaring a cursor using scroll (i.e. so that pg_cursors.is_scrollable is true). And doing this can harm performance and memory consumption because it can cause a cursor's result set to be cached when the execution plan cannot be run backward.
While Issue 6514 is open, you should therefore always declare a cursor with no scroll.