xCluster Disaster Recovery EARLY ACCESS

Fail over to a replica universe in case of unplanned outages

Use xCluster Disaster Recovery (DR) to recover from an unplanned outage (failover) or to perform a planned switchover. Planned switchover is commonly used for business continuity and disaster recovery testing, and failback after a failover.

A DR configuration consists of the following:

a DR primary universe, which serves both reads and writes.
a DR replica universe, which can also serve reads.

Data from the DR primary is replicated asynchronously to the DR replica (which is read only). Due to the asynchronous nature of the replication, DR failover results in non-zero recovery point objective (RPO). In other words, data not yet committed on the DR replica can be lost during a failover. The amount of data loss depends on the replication lag, which in turn depends on the network characteristics between the universes. By contrast, during a switchover RPO is zero, and data is not lost, because the switchover waits for all data to be committed on the DR replica before switching over.

The recovery time objective (RTO) for failover or switchover is very low, and determined by how long it takes applications to switch their connections from one universe to another. Applications should be designed in such a way that the switch happens as quickly as possible.

DR further allows for the role of each universe to switch during planned switchover and unplanned failover scenarios.

All major DR tasks - switchover, failover, delete, restart - are retryable. Switchover tasks can be rolled back if the old stream exists. For information on managing tasks, refer to Monitor universe tasks.

Disaster recovery

Blog: Using YugabyteDB xCluster DR for PostgreSQL Disaster Recovery in Azure

Video: Disaster Recovery With xCluster DR and Two Cloud Regions

Set up Disaster Recovery

Designate a universe to act as a DR replica.

Unplanned failover

Fail over to the DR replica in case of an unplanned outage.

Planned switchover

Switch over to the DR replica for planned testing and failback.

Add and remove tables and indexes

Perform DDL changes to databases in replication.

Schema change modes

xCluster DR can be set up to perform schema changes in the following ways:

Semi-automatic mode, providing simpler steps for performing DDL changes.
Manual mode.

Semi-automatic mode

In this mode, table and index-level schema changes must be performed in the same order as follows:

The DR primary universe.
The DR replica universe.

You don't need to make any changes to the DR configuration.

To learn more, watch Simplified schema management with xCluster DB Scoped

Semi-automatic mode is recommended for all new DR configurations. When possible, existing Manual mode DR configurations should be deleted and re-created using semi-automatic mode to reduce the operational burden of DDL changes.

Manual mode

In manual mode, table and index-level schema changes must be performed on the DR primary universe and the DR replica universe, and, in some cases, they must also be updated on the DR configuration.

The exact sequence of these operations for each type of schema change (DDL) is described in Manage tables and indexes.

Upgrading universes in DR

Use the same version of YugabyteDB on both the DR primary and DR replica.

When upgrading universes in DR replication, you should upgrade and finalize the DR replica before upgrading and finalizing the DR primary.

Note that switchover operations can potentially fail if the DR primary and replica are at different versions.

xCluster DR vs xCluster Replication

xCluster refers to all YugabyteDB deployments with two or more universes, and has two major flavors:

xCluster DR. Provides turnkey workflow orchestration for applications using transactional SQL in an active-active single-master manner, with only unidirectional replication configured at any moment in time. xCluster DR uses xCluster Replication under the hood, and adds workflow automation and orchestration, including switchover, failover, resynchronization to make another full copy, and so on.
xCluster Replication. Moves the data from one universe to another. Can be used for CQL, non-transactional SQL, bi-directional replication, and other deployment models not supported by xCluster DR.

xCluster DR targets one specific and common xCluster deployment model: active-active single-master, unidirectional replication configured at any moment in time, for transactional YSQL.

Active-active means that both universes are active - the primary universe for reads and writes, while the replica can handle reads only.
Single master means that the application writes to only one universe (the primary) at any moment in time.
Unidirectional replication means that at any moment in time, replication traffic flows in one direction, and is configured (and enforced) to flow only in one direction.
Transactional SQL means that the application is using SQL (and not CQL), and write-ordering is guaranteed for reads on the target. Furthermore, transactions are guaranteed to be atomic.

xCluster DR adds higher-level orchestration workflows to this deployment to make the end-to-end setup, switchover, and failover of the DR primary to DR replica simple and turnkey. This orchestration includes the following:

During setup, xCluster DR ensures that both universes have identical copies of the data (using backup and restore to synchronize), and configures the DR replica to be read-only.
During switchover, xCluster DR waits for all remaining changes on the DR primary to be replicated to the DR replica before switching over.
During both switchover and failover, xCluster DR promotes the DR replica from read only to read and write; during switchover, xCluster DR demotes (when possible) the original DR primary from read and write to read only.

For all deployment models other than active-active single-master, unidirectional replication configured at any moment in time, for transactional YSQL, use xCluster Replication directly instead of xCluster DR.

For example, use xCluster Replication for the following deployments:

Multi-master (bidirectional), where you have two application instances, each one writing to a different universe.
Active-active single-master, in which a single master application can freely write (without coordinating with YugabyteDB for failover or switchover) to either universe, because both accept writes.
Non-transactional SQL. That is, SQL without write-order guarantees and without transactional atomicity guarantees.
CQL.

Note that a universe configured for xCluster DR cannot be used for xCluster Replication, and vice versa. Although xCluster DR uses xCluster Replication under the hood, xCluster DR replication is managed exclusively from the xCluster Disaster Recovery tab, and not on the xCluster Replication tab.

(As an alternative to xCluster DR, you can perform setup, failover, and switchover manually. Refer to Set up transactional xCluster.)

xCluster Replication: overview and architecture

xCluster replication between universes in YugabyteDB

Limitations

Currently, automatic replication of DDL (SQL-level changes such as creating or dropping tables or indexes) is not supported. For more details on how to propagate DDL changes from the DR primary to the DR replica, see Schema change modes. This is tracked by GitHub issue #11537.
If a database operation requires a full copy, any application sessions on the database on the DR target will be interrupted while the database is dropped and recreated. Your application should either retry connections or redirect reads to the DR primary.
Setting up DR between a universe upgraded to v2.20.x and a new v2.20.x universe is not supported. This is due to a limitation of xCluster deployments and packed rows. See Packed row limitations.

For more information on the YugabyteDB xCluster implementation and its limitations, refer to xCluster implementation limitations.