Recover failing disk
YugabyteDB can be configured to use multiple storage disks by setting the
--fs_data_dirs configuration option.
This introduces the possibility of disk failure and recovery issues.
Cluster replication recovery
yb-tserver service automatically detects disk failures and attempts to spread the data from the failed disk to other healthy nodes in the cluster. In a single-zone setup with a replication factor (RF) of
3: if you started with four nodes or more, then there would be at least three nodes left after one failed. In this case, rereplication is automatically started if a YB-TServer or disk is down for 10 minutes.
In a multi-zone setup with a replication factor (RF) of
3: YugabyteDB will try to keep one copy of data per zone. In this case, for automatic rereplication of data, a zone needs to have at least two YB-TServers so that if one fails,
its data can be rereplicated to the other. Thus, this would mean at least a six-node cluster.
Failed disk replacement
The steps to replace a failed disk are:
- Stop the YB-TServer node.
- Replace the disks that have failed.
- Restart the
On restart, the YB-TServer will see the new empty disk and start replicating tablets from other nodes.