xCluster Disaster Recovery EARLY ACCESS
Use xCluster Disaster Recovery (DR) to recover from an unplanned outage (failover) or to perform a planned switchover. Planned switchover is commonly used for business continuity and disaster recovery testing, and failback after a failover.
A DR configuration consists of the following:
- a DR primary universe, which serves both reads and writes.
- a DR replica universe, which can also serve reads.
Data from the DR primary is replicated asynchronously to the DR replica (which is read only). Due to the asynchronous nature of the replication, DR failover results in non-zero recovery point objective (RPO). In other words, data not yet committed on the DR replica can be lost during a failover. The amount of data loss depends on the replication lag, which in turn depends on the network characteristics between the universes. By contrast, during a switchover RPO is zero, and data is not lost, because the switchover waits for all data to be committed on the DR replica before switching over.
The recovery time objective (RTO) for failover or switchover is very low, and determined by how long it takes applications to switch their connections from one universe to another. Applications should be designed in such a way that the switch happens as quickly as possible.
DR further allows for the role of each universe to switch during planned switchover and unplanned failover scenarios.
Schema change modes
xCluster DR can be set up to perform schema changes in the following ways:
- Semi-automatic mode, providing simpler steps for performing DDL changes.
- Manual mode.
Semi-automatic mode
Semi-automatic mode is EA . In this mode, table and index-level schema changes must be performed in the same order as follows:
- The DR primary universe.
- The DR replica universe.
You don't need to make any changes to the DR configuration.
Semi-automatic mode is recommended for all new DR configurations. When possible, existing DR configurations should be deleted and re-created using semi-automatic mode to reduce the operational burden of DDL changes.
Semi-automatic mode is used for any xCluster DR configuration when the following pre-requisites are met at setup time:
- Both DR primary and replica are running YugabyteDB v2024.1.3 or later.
- Semi-automatic mode is enabled. While in EA
, the feature is not enabled by default. To enable it, set the DB scoped xCluster replication creation Global runtime configuration option (config key
yb.xcluster.db_scoped.creationEnabled
) to true. Refer to Manage runtime configuration settings. Note that only a Super Admin user can modify Global runtime configuration settings.
Manual mode
In manual mode, table and index-level schema changes must be performed on the DR primary universe and the DR replica universe, and, in some cases, they must also be updated on the DR configuration.
The exact sequence of these operations for each type of schema change (DDL) is described in Manage tables and indexes.
Upgrading universes in DR
When upgrading universes in DR replication, you should upgrade and finalize the DR replica before upgrading and finalizing the DR primary.
Note that switchover operations can potentially fail if the DR primary and replica are at different versions.
xCluster DR vs xCluster Replication
xCluster refers to all YugabyteDB deployments with two or more universes, and has two major flavors:
- xCluster DR. Provides turnkey workflow orchestration for applications using transactional SQL in an active-active single-master manner, with only unidirectional replication configured at any moment in time. xCluster DR uses xCluster Replication under the hood, and adds workflow automation and orchestration, including switchover, failover, resynchronization to make another full copy, and so on.
- xCluster Replication. Moves the data from one universe to another. Can be used for CQL, non-transactional SQL, bi-directional replication, and other deployment models not supported by xCluster DR.
xCluster DR targets one specific and common xCluster deployment model: active-active single-master, unidirectional replication configured at any moment in time, for transactional YSQL.
-
Active-active means that both universes are active - the primary universe for reads and writes, while the replica can handle reads only.
-
Single master means that the application writes to only one universe (the primary) at any moment in time.
-
Unidirectional replication means that at any moment in time, replication traffic flows in one direction, and is configured (and enforced) to flow only in one direction.
-
Transactional SQL means that the application is using SQL (and not CQL), and write-ordering is guaranteed for reads on the target. Furthermore, transactions are guaranteed to be atomic.
xCluster DR adds higher-level orchestration workflows to this deployment to make the end-to-end setup, switchover, and failover of the DR primary to DR replica simple and turnkey. This orchestration includes the following:
- During setup, xCluster DR ensures that both universes have identical copies of the data (using backup and restore to synchronize), and configures the DR replica to be read-only.
- During switchover, xCluster DR waits for all remaining changes on the DR primary to be replicated to the DR replica before switching over.
- During both switchover and failover, xCluster DR promotes the DR replica from read only to read and write; during switchover, xCluster DR demotes (when possible) the original DR primary from read and write to read only.
For all deployment models other than active-active single-master, unidirectional replication configured at any moment in time, for transactional YSQL, use xCluster Replication directly instead of xCluster DR.
For example, use xCluster Replication for the following deployments:
- Multi-master (bidirectional), where you have two application instances, each one writing to a different universe.
- Active-active single-master, in which a single master application can freely write (without coordinating with YugabyteDB for failover or switchover) to either universe, because both accept writes.
- Non-transactional SQL. That is, SQL without write-order guarantees and without transactional atomicity guarantees.
- CQL.
Note that a universe configured for xCluster DR cannot be used for xCluster Replication, and vice versa. Although xCluster DR uses xCluster Replication under the hood, xCluster DR replication is managed exclusively from the xCluster Disaster Recovery tab, and not on the xCluster Replication tab.
(As an alternative to xCluster DR, you can perform setup, failover, and switchover manually. Refer to Set up transactional xCluster Replication.)
Limitations
-
Currently, automatic replication of DDL (SQL-level changes such as creating or dropping tables or indexes) is not supported. For more details on how to propagate DDL changes from the DR primary to the DR replica, see Schema change modes. This is tracked by GitHub issue #11537.
-
If a database operation requires a full copy, any application sessions on the database on the DR target will be interrupted while the database is dropped and recreated. Your application should either retry connections or redirect reads to the DR primary.
-
Setting up DR between a universe upgraded to v2.20.x and a new v2.20.x universe is not supported. This is due to a limitation of xCluster deployments and packed rows. See Packed row limitations.
For more information on the YugabyteDB xCluster implementation and its limitations, refer to xCluster implementation limitations.