CDC unable to stream UPDATE/DELETE operations
Product | Affected Versions | Related Issues | Fixed In |
---|---|---|---|
CDC | v2.20, v2024.1, v2024.2 | #25193 | v2.20.9, v2024.1.4, v2024.2.1 |
Description
Change Data Capture (CDC) may fail to correctly stream UPDATE or DELETE changes under specific configurations. This issue occurs when using the FULL_ROW_NEW_IMAGE and MODIFIED_COLUMNS_OLD_AND_NEW_IMAGES before image modes, or when the DEFAULT Replica Identity is configured and CDC is lagging.
The main cause is the premature removal of retention barriers on the database. This issue affects both gRPC and logical replication models.
Mitigation
The existing CDC stream/replication slot will become unusable and needs to be deleted/dropped. A new CDC stream/replication slot should be created, and the snapshot should be consumed before starting to stream real-time changes to avoid any data loss.
To create a new CDC stream/replication slot, choose from one of the following options:
- If the application does not require the previous values of a row, use before image/replica identity CHANGE.
- If the application requires the previous values of a row, switch to before image mode ALL or replica identity FULL.
Details
By default, YugabyteDB has a history retention of 15 minutes (configured by timestamp_history_retention_interval_sec
) on the database. Furthermore, if the assigned before image mode or replica identity mandates further retention, then a hybrid time-based retention barrier is set for CDC. This hybrid time barrier is moved periodically based on the client's acknowledgment.
The issue was caused by the logic that checked whether the hybrid time-based retention barrier could be moved. This logic only checked whether the before image mode was set to ALL or the replica identity was set to FULL. For all other before-image modes and replica identities that required further retention, these barriers were lifted prematurely.
Note that in the event of <15 minute lag, CDC will be able to stream all DMLs because of the default 15 minute retention. But if the stream lags by more than 15 minutes, then CDC will fail to get the previous values of the row for UPDATE/DELETE operations and, as a result, it won't be able to stream changes.