Distributed transactions

Learn how a distributed transaction works in YugabyteDB

This example shows how a distributed transaction works in YugabyteDB.

Setup

Any YugabyteDB cluster, including local single-node cluster or Sandbox cluster in YugabyteDB Managed. See Set up YugabyteDB universe.

Create a table

Create the following table:

CREATE TABLE accounts (
  account_name varchar,
  account_type varchar,
  balance float,
  PRIMARY KEY (account_name, account_type)
);

Insert some sample data into the table:

INSERT INTO accounts VALUES ('John', 'savings', 1000);
INSERT INTO accounts VALUES ('John', 'checking', 100);
INSERT INTO accounts VALUES ('Smith', 'savings', 2000);
INSERT INTO accounts VALUES ('Smith', 'checking', 50);

Display the contents of the table:

yugabyte=# SELECT * FROM accounts;
 account_name | account_type | balance
--------------+--------------+---------
 John         | checking     |     100
 John         | savings      |    1000
 Smith        | checking     |      50
 Smith        | savings      |    2000
(4 rows)

Run a transaction

Run the following transaction:

BEGIN TRANSACTION;
  UPDATE accounts SET balance = balance - 200
      WHERE account_name='John' AND account_type='savings';
  UPDATE accounts SET balance = balance + 200
      WHERE account_name='John' AND account_type='checking';
COMMIT;

After the transaction succeeds, display the contents of the table:

yugabyte=# SELECT * FROM accounts;
 account_name | account_type | balance
--------------+--------------+---------
 John         | checking     |     300
 John         | savings      |     800
 Smith        | checking     |      50
 Smith        | savings      |    2000
(4 rows)

The transaction can be broken down as follows:

  1. Begin:

    BEGIN TRANSACTION;
    

    The node that receives this statement becomes the transaction coordinator. A new transaction record is created in the transaction status table for the current transaction. It has a unique transaction id with the state PENDING. Note that in practice, these records are pre-created to achieve high performance.

  2. Update:

    UPDATE accounts SET balance = balance - 200
      WHERE account_name='John'
      AND account_type='savings';
    

    The transaction coordinator writes a provisional record to the tablet that contains this row. The provisional record consists of the transaction ID, so the state of the transaction can be determined. If a provisional record written by another transaction already exists, then the current transaction would use the transaction ID that is present in the provisional record to fetch details and check if there is a potential conflict.

  3. Second update:

    UPDATE accounts SET balance = balance + 200
      WHERE account_name='John'
      AND account_type='checking';
    

    This step is largely the same as the previous step. Note that the rows being accessed can live on different nodes. The transaction coordinator would need to perform a provisional write RPC to the appropriate node for each row.

  4. Commit:

    COMMIT;
    

    To commit, all the provisional writes must have successfully completed. The COMMIT statement causes the transaction coordinator to update the transaction status in the transaction status table to COMMITED, at which point it is assigned the commit timestamp (which is a hybrid timestamp to be precise). At this point, the transaction is completed. In the background, the COMMIT record along with the commit timestamp is applied to each of the rows that participated to make future lookups of these rows efficient.

The following diagram shows the sequence of events that occur when a transaction is running:

Distributed transaction write path

Scalability

Because all nodes of the cluster can process transactions by becoming transaction coordinators, horizontal scalability can be achieved by distributing the queries evenly across the nodes of the cluster.

Resilience

Each update performed as a part of the transaction is replicated across multiple nodes for resilience and persisted on disk. This ensures that the updates made as a part of a transaction that has been acknowledged as successful to the end user is resilient even if failures occur.

Concurrency control

Concurrency control in databases ensures that multiple transactions can execute concurrently while preserving data integrity. Concurrency control is essential for correctness in environments where two or more transactions can access the same data at the same time. The two primary mechanisms to achieve concurrency control are optimistic and pessimistic.

YugabyteDB currently supports optimistic concurrency control, with pessimistic concurrency control being worked on actively.

Transaction options

You can see the various options supported by transactions by running the following \h BEGIN meta-command:

yugabyte=# \h BEGIN
Command:     BEGIN
Description: start a transaction block
Syntax:
BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]

where transaction_mode is one of:

  ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
  READ WRITE | READ ONLY
  [ NOT ] DEFERRABLE

transaction_mode

The transaction_mode can be set to one of the following options:

  1. READ WRITE – You can perform read or write operations.
  2. READ ONLY – You can only perform read operations.

For example, trying to do a write operation such as creating a table or inserting a row in a READ ONLY transaction would result in an error:

yugabyte=# BEGIN READ ONLY;
BEGIN
yugabyte=# CREATE TABLE example(k INT PRIMARY KEY);
ERROR: cannot execute CREATE TABLE in a read-only transaction

DEFERRABLE transactions

As in PostgreSQL, the DEFERRABLE transaction property is not used unless the transaction is also both SERIALIZABLE and READ ONLY.

When all three of these properties (SERIALIZABLE, DEFERRABLE, and READ ONLY) are set for a transaction, the transaction may block when first acquiring its snapshot, after which it is able to run without the typical overhead of a SERIALIZABLE transaction and without any risk of contributing to or being canceled by a serialization failure.

Tip

This mode is well-suited for long-running reports or backups without impacting or being impacted by other transactions.