Point-in-time recovery

Point-in-time recovery (PITR) in YugabyteDB enables recovery from a user or software error, while minimizing recovery point objective (RPO), recovery time objective (RTO), and overall impact on the cluster.

PITR is particularly applicable to the following:

  • DDL errors, such as an accidental table removal.
  • DML errors, such as execution of an incorrect update statement against one of the tables.

Typically, you know when the data was corrupted and would want to restore to the closest possible uncorrupted state. With PITR, you can achieve that by providing a timestamp to which to restore. You can specify the time with the precision of up to 1 microsecond, far more precision than is possible with the regular snapshots that are typically taken hourly or daily.

PITR in YugabyteDB is based on a combination of the flashback capability and periodic distributed snapshots.

Flashback provides a way to rewind the data back in time. At any moment, YugabyteDB stores not only the latest state of the data, but also the recent history of changes. With flashback, you can rollback to any point in time in the history retention period. The history is also preserved when a snapshot is taken, which means that by creating snapshots periodically, you effectively increase the flashback retention.

For example, if your overall retention target for PITR is three days, you can use the following configuration:

  • History retention interval is 24 hours.
  • Snapshots are taken daily.
  • Each snapshot is kept for three days.

By default, the history retention period is controlled by the history retention interval flag applied cluster-wide to every YSQL database and YCQL keyspace.

However, when PITR is enabled for a database or a keyspace, YugabyteDB adjusts the history retention for that database or keyspace based on the interval between the snapshots. You are not required to manually set the cluster-wide flag in order to use PITR.

There are no technical limitations on the retention target. However, when you increase the number of stored snapshots, you also increase the amount of space required for the database. The actual overhead depends on the workload, therefore it is recommended to estimate it by running tests based on your applications.

The preceding sample configuration ensures that at any moment there is a continuous change history maintained for the last three days. When you trigger a restore, YugabyteDB selects the closest snapshot to the timestamp you provide, and then uses flashback in that snapshot.

For example, snapshots are taken daily at 11:00 PM, current time is 5:00 PM on April 14th, and you want to restore to 3:00 PM on April 12th. YugabyteDB performs the following:

  1. Locates the snapshot taken on April 12th, which is the closest snapshot taken after the restore time, and restores that snapshot.
  2. Flashes back 8 hours to restore to the state at 3:00 PM, as opposed to 11:00 PM, which is when the snapshot was taken.

Point-In-Time Recovery

Enable and disable PITR

YugabyteDB exposes the PITR functionality through a set of snapshot schedule commands. A schedule is an entity that automatically manages periodic snapshots for a YSQL database or a YCQL keyspace, and enables PITR for the same database or keyspace.

Creating a snapshot schedule for a database or a keyspace effectively enables PITR for that database or keyspace. You cannot recover to point in time unless you create a schedule.

Create a schedule

To create a schedule and enable PITR, use the create_snapshot_schedule command with the following parameters:

  • Interval between snapshots (in minutes).
  • Retention time for every snapshot (in minutes).
  • The name of the database or keyspace.

Assuming the retention target is three days, you can execute the following command to create a schedule that produces a snapshot once a day (every 1,440 minutes) and retains it for three days (4,320 minutes):

./bin/yb-admin -master_addresses <ip1:7100,ip2:7100,ip3:7100> create_snapshot_schedule 1440 4320 <database_name>

The following output is a unique ID of the newly-created snapshot schedule:

{
  "schedule_id": "6eaaa4fb-397f-41e2-a8fe-a93e0c9f5256"
}

You can use this ID to delete the schedule or restore to a point in time.

Delete a schedule

To delete a schedule and disable PITR, use the following delete_snapshot_schedule command that takes the ID of the schedule to be deleted as a parameter:

./bin/yb-admin -master_addresses <ip1:7100,ip2:7100,ip3:7100> delete_snapshot_schedule 6eaaa4fb-397f-41e2-a8fe-a93e0c9f5256

List schedules

To see a list of schedules that currently exist in the cluster, use the following list_snapshot_schedules command:

./bin/yb-admin -master_addresses <ip1:7100,ip2:7100,ip3:7100> list_snapshot_schedules
{
  "schedules": [
    {
      "id": "6eaaa4fb-397f-41e2-a8fe-a93e0c9f5256",
      "options": {
        "interval": "60.000s",
        "retention": "600.000s"
      },
      "snapshots": [
        {
          "id": "386740da-dc17-4e4a-9a2b-976968b1deb5",
          "snapshot_time_utc": "2021-04-28T13:35:32.499002+0000"
        },
        {
          "id": "aaf562ca-036f-4f96-b193-f0baead372e5",
          "snapshot_time_utc": "2021-04-28T13:36:37.501633+0000",
          "previous_snapshot_time_utc": "2021-04-28T13:35:32.499002+0000"
        }
      ]
    }
  ]
}

You can also use the same command to view the information about a particular schedule by providing its ID as a parameter, as follows:

./bin/yb-admin -master_addresses <ip1:7100,ip2:7100,ip3:7100> list_snapshot_schedules 6eaaa4fb-397f-41e2-a8fe-a93e0c9f5256

Restore to a point in time

Stop workloads before restoring

Stop all the application workloads before you restore to a point in time. Transactions running concurrently with the restore operation can lead to data inconsistency.

This requirement will be removed in an upcoming release, and is tracked in issue 12853.

If a database or a keyspace has an associated snapshot schedule, you can use that schedule to restore the database or keyspace to a particular point in time by using the restore_snapshot_schedule command with the following parameters:

  • The ID of the schedule.

  • Target restore time, with the following two options:

    • Restore to an absolute time, providing a specific timestamp in one of the following formats:

      For example, the following command restores to 1:00 PM PDT on May 1st 2022 using a Unix timestamp:

      ./bin/yb-admin \
          -master_addresses <ip1:7100,ip2:7100,ip3:7100> \
          restore_snapshot_schedule 6eaaa4fb-397f-41e2-a8fe-a93e0c9f5256 1651435200
      

      The following is an equivalent command that uses a YCQL timestamp:

      ./bin/yb-admin \
          -master_addresses <ip1:7100,ip2:7100,ip3:7100> \
          restore_snapshot_schedule 6eaaa4fb-397f-41e2-a8fe-a93e0c9f5256 2022-05-01 13:00-0700
      
    • Restore to a time that is relative to the current (for example, to 10 minutes ago from now) by specifying how much time back you would like to roll a database or keyspace.

      For example, to restore to 5 minutes ago, run the following command:

      ./bin/yb-admin \
          -master_addresses <ip1:7100,ip2:7100,ip3:7100> \
          restore_snapshot_schedule 6eaaa4fb-397f-41e2-a8fe-a93e0c9f5256 minus 5m
      

      Or, to restore to 1 hour ago, use the following:

      ./bin/yb-admin \
          -master_addresses <ip1:7100,ip2:7100,ip3:7100> \
          restore_snapshot_schedule 6eaaa4fb-397f-41e2-a8fe-a93e0c9f5256 minus 1h
      

      For detailed information on the relative time formatting, refer to the restore_snapshot_schedule reference.

YSQL index backfill

YugabyteDB supports index backfill, which asynchronously populates a new index. The process runs in the background and can take a significant amount of time, depending on the size of the data. If you restore to a point in time soon after an index creation, you're likely to hit a state where the index is in the middle of the backfill process.

YugabyteDB ignores these partly-backfilled indexes during read operations. To make sure the indexes are properly used, you need to drop and create them again to reinitiate the backfill process. Run the following query to get a list of indexes that need to be recreated:

SELECT pg_class.relname
    FROM pg_index
    JOIN pg_class
    ON pg_index.indexrelid = pg_class.oid
    WHERE NOT indisvalid;

This affects only YSQL databases. For YCQL, YugabyteDB automatically restarts index backfill after the restore.

This limitation will be removed in an upcoming release, and is tracked in issue 12672.

Limitations

PITR functionality has several limitations, primarily related to interactions with other YugabyteDB features. Most of these limitations will be addressed in upcoming releases; refer to each limitation's corresponding tracking issue for details.

CDC

Using PITR and CDC together is currently not supported.

Tracking issue: 12773

xCluster replication

The combination of PITR and xCluster replication is not fully tested, and is considered beta.

xCluster does not replicate any commands related to PITR. If you have two clusters with replication between them, enable PITR on both ends separately. To restore, the following is the recommended procedure:

  1. Stop application workloads and make sure there are no active transactions.
  2. Wait for replication to complete.
  3. Restore to the same time on both clusters.
  4. Resume the application workloads.

Tracking issue: 10820

Tablegroups

Using PITR with tablegroups is not currently supported. If you attempt to create a PITR schedule within a cluster with tablegroups, you'll get an error. Attempting to create a tablegroup if a schedule exists on any of the databases will also produce an error.

Tracking issue: 11924

Global objects

PITR doesn't support global objects, such as tablespaces, roles, and permissions, because they're not currently backed up by the distributed snapshots. If you alter or drop a global object, then try to restore to a point in time before the change, the object will not be restored.

Tracking issue for YSQL tablespaces: 10257

Tracking issue for YSQL roles and permissions: 10349

Tracking issue for YCQL: 8453

Special case for tablespaces

Tablespaces are crucial for geo-partitioned deployments. Trying to restore a database that relies on a removed tablespace will lead to unexpected behavior, so the DROP TABLESPACE command is currently disallowed if a schedule exists on any of the databases in the cluster.

YSQL system catalog upgrade

You can't use PITR to restore to a state before the most recent YSQL system catalog upgrade. Trying to do so will produce an error. You can still use distributed snapshots to restore in this scenario.

Tracking issue: 13158

This limitation applies only to YSQL databases. YCQL is not affected.

Other limitations

  • The TRUNCATE command is disallowed for databases with a snapshot schedule. Tracking issue: 7129.
  • PITR works only with in-cluster distributed snapshots. PITR support for off-cluster backups is under consideration for the future. Tracking issue: 8847.
  • You can't modify a snapshot schedule once it's created. If you need to change the interval or the retention period, delete the snapshot and recreate it with the new parameters. Tracking issue: 8417.