Transaction retries in YSQL
YugabyteDB returns different error codes for the various scenarios that go wrong during transaction processing. Applications need to be designed to handle these scenarios correctly to be highly available so that users aren't impacted. Although most errors are common across multiple isolation levels, some errors are very specific to certain transaction isolation levels.
The examples in the following sections illustrate failure scenarios and techniques you can use to handle these failures in your applications.
Follow the setup instructions to start a single local YugabytedDB instance, and create a table as follows:
Create the table as follows:
CREATE TABLE txndemo ( k int, V int, PRIMARY KEY(k) );
Add some data to the table as follows:
INSERT INTO txndemo VALUES (1,10),(2,10),(3,10),(4,10),(5,10);
YugabyteDB retries failed transactions automatically on the server side whenever possible without client intervention as per the concurrency control policies. This is the case even for single statements, which are implicitly considered transactions. In Read Committed isolation mode, the server retries indefinitely.
In some scenarios, a server-side retry is not suitable. For example, the retry limit has been reached or the transaction is not in a valid state. In these cases, it is the client's responsibility to retry the transaction at the application layer.
Most transaction errors that happen due to conflicts and deadlocks can be restarted by the client. The following scenarios describe the causes for failures, and the required methods to be handled by the applications.
Execute the transaction in a
try..catch block in a loop. When a re-tryable failure happens, issue ROLLBACK and then retry the transaction. To avoid overloading the server and ending up in an indefinite loop, wait for some time between retries and limit the number of retries. The following illustrates a typical client-side retry implementation:
max_attempts = 10 # max no.of retries sleep_time = 0.002 # 2 ms - base sleep time backoff = 2 # exponential multiplier attempt = 0 while attempt < max_attempts: attempt += 1 try : cursor = cxn.cursor() cursor.execute("BEGIN"); # Execute Transaction Statments here cursor.execute("COMMIT"); break except psycopg2.errors.SerializationFailure as e: cursor.execute("ROLLBACK") if attempt < max_attempts: time.sleep(sleep_time) sleep_time *= backoff
COMMIT is successful, the program exits the loop.
attempt < max_attempts limits the number of retries to
max_attempts, and the amount of time the code waits before the next retry also increases with
sleep_time *= backoff. Choose values as appropriate for your application.
40001 - SerializationFailure
SerializationFailure errors happen when multiple transactions are updating the same set of keys (conflict) or when transactions are waiting on each other (deadlock). The error messages could be one of the following types:
During a conflict, certain transactions are retried. However, after the retry limit is reached, an error occurs as follows:
ERROR: 40001: All transparent retries exhausted.
All transactions are given a dynamic priority. When a deadlock is detected, the transaction with lower priority is automatically killed. For this scenario, the client might receive a message similar to the following:
ERROR: 40001: Operation expired: Heartbeat: Transaction XXXX expired or aborted by a conflict
The correct way to handle this error is with a retry loop with exponential backoff, as described in Client-side retry. When the UPDATE or COMMIT fails because of
SerializationFailure, the code retries after waiting for
sleep_time seconds, up to
Read CommittedIn read committed isolation level, as the server retries internally, the client does not need to worry about handling SerializationFailure. Only transactions operating in repeated read and serializable levels need to handle serialization failures.
Another way to handle these failures is would be to rollback to a checkpoint before the failed statement and proceed further as described in Savepoints.
Savepoints are named checkpoints that can be used to rollback just a few statements, and then proceed with the transaction, rather than aborting the entire transaction when there is an error.
Consider the following example that inserts a row
connstr = 'postgresql://yugabyte@localhost:5433/yugabyte' cxn = psycopg2.connect(connstr) cursor = cxn.cursor() try: cursor.execute("BEGIN") # ... Execute other statements cursor.execute("SAVEPOINT before_insert") try: # insert a row cursor.execute("INSERT INTO txndemo VALUES (1,30)") except psycopg2.errors.UniqueViolation as e: print(e) # k=1 already exists in our table cursor.execute("ROLLBACK TO SAVEPOINT before_insert") cursor.execute("UPDATE txndemo SET v=30 WHERE k=1;") # ... Execute other statements cursor.execute("COMMIT") except Exception as e: print(e) cursor.execute("ROLLBACK")
If the row
[k=1] already exists in the table, the INSERT would result in a UniqueViolation exception. Technically, the transaction would be in an error state and further statements would result in a 25P02: In failed SQL transaction error. You have to catch the exception and rollback. But instead of rolling back the entire transaction, you can rollback to the previously declared savepoint
before_insert, and update the value of the row with
k=1. Then you can continue with other statements in the transaction.
Although most transactions can be retried in most error scenarios, there are cases where retrying a transaction will not resolve an issue. For example, errors can occur when statements are issued out of place. These statements have to be fixed in code to continue further.
25001 - Specify transaction isolation level
Transaction level isolation should be specified before the first statement of the transaction is executed. If not the following error occurs:
BEGIN Time: 0.797 ms
UPDATE txndemo SET v=20 WHERE k=1;
UPDATE 1 Time: 10.416 ms
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
ERROR: 25001: SET TRANSACTION ISOLATION LEVEL must be called before any query Time: 3.808 ms
25006 - Modify a row in a read-only transaction
This error occurs when a row is modified after specifying a transaction to be READ ONLY as follows:
BEGIN READ ONLY;
BEGIN Time: 1.095 ms
UPDATE txndemo SET v=20 WHERE k=1;
ERROR: 25006: cannot execute UPDATE in a read-only transaction Time: 4.417 ms
25P02 - InFailedSqlTransaction
This error occurs when a statement is issued after there's already an error in a transaction. The error message would be similar to the following:
ERROR: 25P02: current transaction is aborted, commands ignored until end of transaction block
Consider the following scenario:
BEGIN Time: 0.393 ms
INVALID TXN STATEMENT;
ERROR: 42601: syntax error at or near "INVALID" Time: 2.523 ms
SELECT * from txndemo where k=1;
ERROR: 25P02: current transaction is aborted, commands ignored until end of transaction block Time: 17.074 ms